No results from file_extractor extracted data in Drupal 10.1

configpath=set/this/to/siteconfig/path apt install poppler-utils #(or however your distibution provides this) composer require drupal/file_extractor:4.1.1 drush pm:enable file_extractor cat << EOF > $configpath/file_extractor.settings.yml extraction_method: pdftotext_extractor extraction_method_settings: pdftotext_path: pdftotext extraction_settings: extractable: excluded_extensions: 'aif art avi bmp gif ico mov oga ogv png psd ra ram rgb flv' max_filesize: '0' exclude_private: true extraction_result: number_first_bytes: '1 MB' EOF drush cim

Comments & Activities

Issue created by @tgoeg
Comment almost 2 years ago →
🇸🇮Slovenia deaom
I followed the steps mentioned, but could not reproduce the issue. Tested with D10.1.1, search api meilisearch dev version (git checkout not composer install), file extractor 4.1.1 and poppler-utils for the pdf to text conversion. I get the title and the extracted file field displayed in Drupal. I also checked the meilisearh and it's working there properly also. I'm not to familiar with the file extractor to know what the issue could be as you said it works with the DB server.
Do you see any errors, maybe in console or in the logs?
Will leave this open, so anybody else can also try and reproduce the issue.
Status changed to Closed: works as designed almost 2 years ago2:36pm 6 September 2023
Comment almost 2 years ago →
🇦🇹Austria tgoeg
I drilled down deeper on this.

I have a working setup now as well. If I had followed the instructions above myself, it would have worked out as well :-)

The problem might stem from the ID mappings (now made obsolete by 📌 Remove entity id mapping config Fixed ) and the fact that I share indexes between multiple indexes (D10.1 and D9.5) for quicker testing (they are pretty huge and I don't want to wait for indexing during testing). I still don't get why only some fields get displayed and others not, however.

What seems to have fixed it:

Updated to current dev-1.0.x

Deleted index in search_api config (re-indexing would not fix it and this is something I will add to another ticket; stems from 🐛 Remove the possibility to add field with machine name id Fixed as it wanted to create the column "id" another time)

Re-imported config (to recreate index and deleted views that got dropped together with the index)

Re-indexed nodes

Profit

I guess this can be closed.
And the learning might be that sharing indexes currently might not be a good idea, yet, though it mostly seems to work. Maybe it fully works when all instances use a version fixing 📌 Remove entity id mapping config Fixed .

No results from file_extractor extracted data in Drupal 10.1

Problem/Motivation

Steps to reproduce

Comments & Activities