Search API Attachments Not working

Created on 12 August 2024, 3 months ago
Updated 30 August 2024, 3 months ago

Apologies for filing a support request ticket for debugging my own setup here, but I'm quite stumped, and perhaps if we find a resolution, this will help others.

I am attempting to set up Search API Attachments on my site, which we recently migrated to using Solr (from Elastic). This has previously worked under Elastic, but stopped at one point a while ago. We decided to move to Solr for a number of other reasons though.

Now I believe I have this set up correctly:

  • Processor is enabled
  • File fields set to "fulltext"
  • Solr extractor selected

But it doesn't seem to return the results for a test case of a unique string within an attached PDF document. Using the word "supercalifragilisticexpialidocious"

It doesn't appear to be indexing my file field content. Which is a "File" media asset reference. For good measure, I've tried adding both the media reference field and the file field on the media item to be indexed - as full text per the readme, and still nothing.
I've been trying to trace it down in xdebug, The best I can determine is that the processor isn't being added to this array:

            if ($property instanceof ProcessorPropertyInterface) {
              $processors_with_fields[$property->getProcessorId()] = TRUE;
            }

search_api/src/Item/Item.php

And the type for file fields here is always \Drupal\field\Entity\FieldConfig.

I believe I've followed all the steps in the Readme, but have I missed anything?

πŸ’¬ Support request
Status

Closed: works as designed

Version

9.0

Component

Documentation

Created by

πŸ‡ΊπŸ‡ΈUnited States bburg Washington D.C.

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bburg
  • πŸ‡ΊπŸ‡ΈUnited States bburg Washington D.C.

    So I figured out the issue, Search API Attachments adds a whole new field type to the fields you can add to be indexed, which appear as:

    "Search api attachments: {My Files Field}" with the machine name "saa_field_{files}" -- curly braces replacing what would be your specific field name. While I had the standard field added to the index, which of course just ended up indexing as the file ID. It's mentioned in this part of the documentation:

    "8. Go to admin/config/search/search-api/index/my_index/fields/add/nojs and:

    - in the General section, add the "Search api attachments: My pdfs" field.
    - in the Content section, add the "Title".
    - in the Content section, add the "Body".
    "

    Which didn't jump out at me on my first several read throughs. I added this to be indexed, and it worked as expected.

    Would it be unreasonable to request we update the documentation to call out that this is a new field, and not the standard file field?

  • Status changed to Closed: works as designed 3 months ago
  • πŸ‡«πŸ‡·France izus

    actually all these steps are documented in the README, please feel free to reopen and add the any documentation you find useful

Production build 0.71.5 2024