Extracting text from PDF in a media field for a prompt?

Issue created by @bogdog400
Comment about 1 year ago →
🇩🇪Germany marcus_johansson
When I have time to solve this https://www.drupal.org/project/ai_interpolator/issues/3446245 📌 Add inputter plugins Active it should be possible to create a token base field inputter, that could do what you want to do. The problem with that is that its highly complex and if you put the wrong token (say a string) it needs error handling for this.

Another option is that the entity is looking inside child entities for in this case file fields. This would of course come with comical side effects like the user profile image showing up as a choice for image input on most entity types. It also is a problem because of deltas - if you have a node that has paragraphs that has medias that has images, how would you choose which one to pick? Let me think about that or if you/anyone have suggestions on solving it. Maybe AI prompt engineering could solve it, though I guess that's more for text prompts?
Comment about 1 year ago →
🇺🇸United States bogdog400
Okay. Well, let me know when you're able to tackle this problem.

Or do some of the AIs take PDFs directly?
Comment about 1 year ago →
🇩🇪Germany marcus_johansson
Ah, sorry - that issue might actually be that the file_extractor did not trigger - I will check if I can install and try it out and see if there is a bug in the Token integration.

For PDF to text in the AI Interpolator there is:
* https://www.drupal.org/project/ai_interpolator_convertapi → - cheap, does similar quality as Tika.
* https://www.drupal.org/project/unstructured → - can be self hosted, is better then anything else and can also use XLSX, JPG, PNG, DOCX etc. Really awesome product.

Extracting text from PDF in a media field for a prompt?

Problem/Motivation

Comments & Activities