Generate text from image field

Issue created by @fishfree
Comment about 1 year ago →
🇮🇳India Ravi Patel - Drupal Vadodara
Try below.

https://stackoverflow.com/questions/74785930/how-to-make-image-fields-al...
Comment about 1 year ago →
🇨🇳China fishfree
@Ravi1890 I'd like to extract text from images with OpenAI's OCR ability.
Comment about 1 year ago →
🇩🇪Germany marcus_johansson
@fishfree, it is possible using the RC to use with the gpt-4-vision model and in the DEV version with both. Unfortunately the model call does not specify which models that have this, so I have to hard code it.

You can get it working this way:

Install Token module.

Create an image field.

Create a text field.

Click add AI Interpolator.

There is a field called OpenAI Vision that shows up when you choose the right model ( see documentation → ). Connect the image field in this field.

Write a prompt that is generic for what you want - "Can you describe what is happening in the image".

Please note that if you want OCR capabilities specifically the Unstructured plugin is much better: https://www.drupal.org/project/unstructured → . Vision has limitations: https://platform.openai.com/docs/guides/vision/limitations
Comment about 1 year ago →
🇨🇳China fishfree
@Marcus Thank you! For me, there is a "OpenAI Video To Text", no "OpenAI Image to Text", and I can only select a file field as source, no image field. I don't know if it works with image fields & "OpenAI Video To Text"?
Comment about 1 year ago →
🇩🇪Germany marcus_johansson
@fishree - ah, there is no specific image to text rule, any rule can take image as context. So you can use text to text rules. This makes it a lot more flexible, since you can seed any type of field from an image.

Check this video and you get the concept: https://www.youtube.com/watch?v=KpF-oavOL_0. At around 8:00 you see how to enable a field. Note that the gpt-4o will only show up this field in DEV branch still. Soon a release comes.
Comment about 1 year ago →
🇨🇳China fishfree
@Marcus Thank you, Marcus. I tried as the screenshots below, still no luck. What's wrong with my config? Because I'd like to extract text from image, so I didn't use the {{ context }} text field.
Comment about 1 year ago →
🇨🇳China fishfree
Comment about 1 year ago →
🇩🇪Germany marcus_johansson
The base setup is ok, but think off the following:

As mentioned OpenAI is not meant for OCR. You should really use Unstructured (or soon Google Vision will be added). OpenAI is great at describing what is happening in an image, but it can't read longer texts.

If you can get it to work in ChatGPT, then look at the 480x480 image style, make sure that that image style has enough quality to actually show the text. But the main issue is #1.

Also do a general check that everything is working, by asking it to just describe the image in 2 sentences - if that fails let me know, those functions are still in DEV and is waiting to be tested.
Comment about 1 year ago →
🇨🇳China fishfree
@Marcus Thank you! My fault: I forgot to input some text in the base text field, even which is not related to my OCR task.
Unstructured uses tesseract-ocr, which is quite awful for my situation. GPT-4o is much better.
Comment about 1 year ago →
🇩🇪Germany marcus_johansson
@fishfree - good that you solved it! I don't know your exact use case, but if GPT4-o is not good enough for whatever reason you should still look into Unstructured (or Google Vision when I release it) - it has much more then tesseract.

If you choose hi-res you have detectron2, yolox and if you use the SaaS they have a custom trained model named Chipper, that beats anything I tested so far.

It can also generate structures down to markdown, which is great for passing to LLMs.

Could you close the ticket, if this is done.
Status changed to Closed: works as designed about 1 year ago10:19pm 28 May 2024
Comment about 1 year ago →
🇨🇳China fishfree
OK. Thank you for your suggestion!

Generate text from image field

Comments & Activities