Impossible to index attachment in nested entity reference

Created on 23 October 2018, about 6 years ago
Updated 16 February 2023, over 1 year ago

Hello,

I have justed tested the module with the version 1.0-beta10 and tika 1.19.

I works fine for file fields directly on the index entity type and on index on media entity.

But my use case is that I want to index attachment with a following structure:

node > block content > media > file

Each ">" is an entity reference field.

The problem is that in search_api_attachments/src/Plugin/search_api/processor/FilesExtractor.php

  protected function getFileFieldsAndFileEntityItems() {
    $file_elements = [];

    // Retrieve file fields of indexed bundles.
    foreach ($this->getIndex()->getDatasources() as $datasource) {
      if ($datasource->getPluginId() == 'entity:file') {
        $file_elements[static::SAA_FILE_ENTITY] = $this->t('File entity');
      }
      foreach ($datasource->getPropertyDefinitions() as $property) {
        if ($property instanceof FieldDefinitionInterface) {
          if ($property->getType() == 'file') {
            $file_elements[$property->getName()] = $property->getLabel();
          }
          if ($property->getType() == "entity_reference") {
            if ($property instanceof FieldConfig) {
              $deps = $property->getDependencies();
              if (in_array('media.type.file', $deps['config'])) {
                $file_elements[$property->getName()] = $property->getLabel();
              }
            }
          }
        }
      }
    }
    return $file_elements;
  }

There is no possibility to explore deeply nested entity reference and there is a hardcoded dependency on the "media.type.file" media type.

Should a recursive scan of the entity reference fields can be done or should the architecture of the module should change completely?

Thanks for any help.

Feature request
Status

Needs review

Version

1.0

Component

Miscellaneous

Created by

🇫🇷France Grimreaper France 🇫🇷

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇳🇱Netherlands valgibson

    Can confirm that the patch in #17 also applies perfectly to Drupal 9.5.3 (saa 9.0.0). Hope this will be commited in the next stable release.

  • 🇫🇷France Asterovim Paris

    Hello.

    It's work's with the patch into Drupal Core 10.0.9 and Search API attachments 9.0.1.

    Thanks you.

  • 🇧🇪Belgium bramvandenbulcke

    We've been using patch #17 for some time now and it's working fine!

  • 🇧🇪Belgium ludo.r Brussels

    Hello,

    Patch #17 works fine.

    For the ones like me who didn't know how it works:
    This patch creates a new field under the file entity itself called "Search API attachments: extracted file".

    This is Node => Paragraph => Media => File:

    field_downloads:entity:field_files:entity:field_media_file:entity:search_api_attachments_extracted_file

  • Status changed to Needs work 2 months ago
  • 🇳🇿New Zealand jonathan_hunt

    Patch #17 works for me on Search API attachments 9.0.3, but looks like it needs work. I added a field to extract text from a media file and set "Get contents of text attachments directly using file_get_contents." to true, but when the content (and file) are indexed ExtractedFileFieldItemList plugin is invoked and calls $files_extractor_processor FilesExtractor and the configuration is empty. If I force the the test for read_text_files_directly to pass, the text is extracted and indexed ok, but there is a problem with the config not propagating.

  • 🇳🇿New Zealand jonathan_hunt

    Resolved config issue in extractOrGetFromCache() by fetching config in the same way as already done in addFieldValues. Not sure if that's how it should be, but it is working for me on Drupal 10.3.2 and Search API attachments 9.0.3.

  • Status changed to Needs review 2 months ago
  • 🇨🇦Canada bensti

    @jonathan_hunt, seem to work with my D10.3.6 and search_api_attachments 9.0.5.
    Great job!

  • Hello,

    I am working on a Drupal 11 site and using the Paragraphs module to structure content. I have added a Paragraph type called "Attachments", which allows for uploading multiple files. However, I am unable to index this field in Solr using the Search API Attachments module.

    The field appears on the page at:

    admin/config/search/search-api/index/main_index/fields/add/nojs

    But, when trying to select it, the field content is not available for indexing. I can easily select other properties like the file name, but not the actual file content itself.

    I am using the following version of the module:

    https://www.drupal.org/project/search_api_attachments/issues/3474910 Drupal 11 Ready Needs review

    Has anyone encountered a similar issue or found a workaround for indexing these attachment fields in Solr? Any pointers on where I could potentially hardcode a solution?

    Thank you in advance for your help!

Production build 0.71.5 2024