Impossible to index attachment in nested entity reference

Created on 23 October 2018, over 6 years ago
Updated 16 February 2023, about 2 years ago

Hello,

I have justed tested the module with the version 1.0-beta10 and tika 1.19.

I works fine for file fields directly on the index entity type and on index on media entity.

But my use case is that I want to index attachment with a following structure:

node > block content > media > file

Each ">" is an entity reference field.

The problem is that in search_api_attachments/src/Plugin/search_api/processor/FilesExtractor.php

  protected function getFileFieldsAndFileEntityItems() {
    $file_elements = [];

    // Retrieve file fields of indexed bundles.
    foreach ($this->getIndex()->getDatasources() as $datasource) {
      if ($datasource->getPluginId() == 'entity:file') {
        $file_elements[static::SAA_FILE_ENTITY] = $this->t('File entity');
      }
      foreach ($datasource->getPropertyDefinitions() as $property) {
        if ($property instanceof FieldDefinitionInterface) {
          if ($property->getType() == 'file') {
            $file_elements[$property->getName()] = $property->getLabel();
          }
          if ($property->getType() == "entity_reference") {
            if ($property instanceof FieldConfig) {
              $deps = $property->getDependencies();
              if (in_array('media.type.file', $deps['config'])) {
                $file_elements[$property->getName()] = $property->getLabel();
              }
            }
          }
        }
      }
    }
    return $file_elements;
  }

There is no possibility to explore deeply nested entity reference and there is a hardcoded dependency on the "media.type.file" media type.

Should a recursive scan of the entity reference fields can be done or should the architecture of the module should change completely?

Thanks for any help.

✨ Feature request
Status

Needs review

Version

1.0

Component

Miscellaneous

Created by

🇫🇷France Grimreaper France 🇫🇷

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇳🇱Netherlands valgibson

    Can confirm that the patch in #17 also applies perfectly to Drupal 9.5.3 (saa 9.0.0). Hope this will be commited in the next stable release.

  • 🇫🇷France Asterovim Paris

    Hello.

    It's work's with the patch into Drupal Core 10.0.9 and Search API attachments 9.0.1.

    Thanks you.

  • 🇧🇪Belgium bramvandenbulcke

    We've been using patch #17 for some time now and it's working fine!

  • 🇧🇪Belgium ludo.r Brussels

    Hello,

    Patch #17 works fine.

    For the ones like me who didn't know how it works:
    This patch creates a new field under the file entity itself called "Search API attachments: extracted file".

    This is Node => Paragraph => Media => File:

    field_downloads:entity:field_files:entity:field_media_file:entity:search_api_attachments_extracted_file

  • Status changed to Needs work 7 months ago
  • 🇳🇿New Zealand jonathan_hunt

    Patch #17 works for me on Search API attachments 9.0.3, but looks like it needs work. I added a field to extract text from a media file and set "Get contents of text attachments directly using file_get_contents." to true, but when the content (and file) are indexed ExtractedFileFieldItemList plugin is invoked and calls $files_extractor_processor FilesExtractor and the configuration is empty. If I force the the test for read_text_files_directly to pass, the text is extracted and indexed ok, but there is a problem with the config not propagating.

  • 🇳🇿New Zealand jonathan_hunt

    Resolved config issue in extractOrGetFromCache() by fetching config in the same way as already done in addFieldValues. Not sure if that's how it should be, but it is working for me on Drupal 10.3.2 and Search API attachments 9.0.3.

  • Status changed to Needs review 7 months ago
  • 🇳🇿New Zealand jonathan_hunt
  • 🇨🇦Canada bensti

    @jonathan_hunt, seem to work with my D10.3.6 and search_api_attachments 9.0.5.
    Great job!

  • Hello,

    I am working on a Drupal 11 site and using the Paragraphs module to structure content. I have added a Paragraph type called "Attachments", which allows for uploading multiple files. However, I am unable to index this field in Solr using the Search API Attachments module.

    The field appears on the page at:

    admin/config/search/search-api/index/main_index/fields/add/nojs

    But, when trying to select it, the field content is not available for indexing. I can easily select other properties like the file name, but not the actual file content itself.

    I am using the following version of the module:

    https://www.drupal.org/project/search_api_attachments/issues/3474910 ✨ Drupal 11 Ready Needs review

    Has anyone encountered a similar issue or found a workaround for indexing these attachment fields in Solr? Any pointers on where I could potentially hardcode a solution?

    Thank you in advance for your help!

  • 🇧🇪Belgium detroz

    This patch is compatible with Search API Attachment 9.0.6

  • 🇫🇷France DrDam

    Reroll patch for 10.0.x

  • 🇫🇷France DrDam

    With the good patch, it works better

  • First commit to issue fork.
  • 🇦🇺Australia geoffreyr

    geoffreyr → changed the visibility of the branch 3008580-impossible-to-index to hidden.

  • Merge request !393008580: Reroll #30 against 9.0.x → (Open) created by geoffreyr
  • 🇦🇺Australia geoffreyr

    We've been working with patch #17 in this issue for some time. I've rerolled #30 against the 9.0.x branch and created a merge request.

  • Pipeline finished with Success
    about 1 month ago
    Total: 189s
    #432376
  • 🇬🇧United Kingdom joachim

    This MR's approach of using a computed field to provide the files to SearchAPI is interesting, but I don't think it's the right way.

    We've got multiple issues that are all to do with 'how to get files out of the entity':

    - this one, about paragraphs
    - layout builder: ✨ Make it possible to index attachments from index only fields that don't exist on the entity Active
    - files linked in body text: ✨ Plugin type to allow other ways to determine files to be indexed Active

    I think the approach needed is to have FilesProcessor hand over to a set of plugins which handle 'getting a PDF file from the current entity'. There's an MR for that over at ✨ Plugin type to allow other ways to determine files to be indexed Active

Production build 0.71.5 2024