Generate text from image field

Created on 22 May 2024, 7 months ago
Updated 28 May 2024, 7 months ago
Feature request
Status

Closed: works as designed

Version

1.0

Component

Code

Created by

🇨🇳China fishfree

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @fishfree
  • 🇨🇳China fishfree

    @Ravi1890 I'd like to extract text from images with OpenAI's OCR ability.

  • 🇩🇪Germany marcus_johansson

    @fishfree, it is possible using the RC to use with the gpt-4-vision model and in the DEV version with both. Unfortunately the model call does not specify which models that have this, so I have to hard code it.

    You can get it working this way:

    1. Install Token module.
    2. Create an image field.
    3. Create a text field.
    4. Click add AI Interpolator.
    5. There is a field called OpenAI Vision that shows up when you choose the right model ( see documentation ). Connect the image field in this field.
    6. Write a prompt that is generic for what you want - "Can you describe what is happening in the image".

    Please note that if you want OCR capabilities specifically the Unstructured plugin is much better: https://www.drupal.org/project/unstructured . Vision has limitations: https://platform.openai.com/docs/guides/vision/limitations

  • 🇨🇳China fishfree

    @Marcus Thank you! For me, there is a "OpenAI Video To Text", no "OpenAI Image to Text", and I can only select a file field as source, no image field. I don't know if it works with image fields & "OpenAI Video To Text"?

  • 🇩🇪Germany marcus_johansson

    @fishree - ah, there is no specific image to text rule, any rule can take image as context. So you can use text to text rules. This makes it a lot more flexible, since you can seed any type of field from an image.

    Check this video and you get the concept: https://www.youtube.com/watch?v=KpF-oavOL_0. At around 8:00 you see how to enable a field. Note that the gpt-4o will only show up this field in DEV branch still. Soon a release comes.

  • 🇨🇳China fishfree

    @Marcus Thank you, Marcus. I tried as the screenshots below, still no luck. What's wrong with my config? Because I'd like to extract text from image, so I didn't use the {{ context }} text field.

  • 🇩🇪Germany marcus_johansson

    The base setup is ok, but think off the following:

    1. As mentioned OpenAI is not meant for OCR. You should really use Unstructured (or soon Google Vision will be added). OpenAI is great at describing what is happening in an image, but it can't read longer texts.
    2. If you can get it to work in ChatGPT, then look at the 480x480 image style, make sure that that image style has enough quality to actually show the text. But the main issue is #1.
    3. Also do a general check that everything is working, by asking it to just describe the image in 2 sentences - if that fails let me know, those functions are still in DEV and is waiting to be tested.
  • 🇨🇳China fishfree

    @Marcus Thank you! My fault: I forgot to input some text in the base text field, even which is not related to my OCR task.
    Unstructured uses tesseract-ocr, which is quite awful for my situation. GPT-4o is much better.

  • 🇩🇪Germany marcus_johansson

    @fishfree - good that you solved it! I don't know your exact use case, but if GPT4-o is not good enough for whatever reason you should still look into Unstructured (or Google Vision when I release it) - it has much more then tesseract.

    If you choose hi-res you have detectron2, yolox and if you use the SaaS they have a custom trained model named Chipper, that beats anything I tested so far.

    It can also generate structures down to markdown, which is great for passing to LLMs.

    Could you close the ticket, if this is done.

  • Status changed to Closed: works as designed 7 months ago
  • 🇨🇳China fishfree

    OK. Thank you for your suggestion!

Production build 0.71.5 2024