Image style option for GPT Vision

Created on 20 March 2024, 9 months ago
Updated 23 May 2024, 7 months ago

Problem/Motivation

Some original images can be very large in size. Reducing the size of the base64 payload will reduce latency to the API as well as allow for more fine grained controls over token usage.

Proposed resolution

Add an image style configuration option to the image options if GPT Vision Preview is chosen.

Remaining tasks

See MR.

User interface changes

New field to choose image style, saved to config,

✨ Feature request
Status

Needs review

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @alexandersluiter
  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    @alexandersluiter - Just so I understand it correctly, you want the image that is being sent to Vision to be minimized before sending, not the created image from DALL-E to be minimized before storing?

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    This is correct. I will post a merge request or patch soon to handle all of it.

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    @Marcus_Johansson Some of our product images can be upwards of 50MB+ in their original uploaded form. We don't deliver that to the DOM for normal visitors, we resize based many factors. After rough calculations, we would have to send at least 3TB across the wire and spend a lot of CPU time base64 encoding large images. Cutting them down on our side first makes sense in our use case.

    That being said, after reviewing the code the changes would have to be made in the OpenAiBase class where images are derived and encoded. The way you have it currently written, there is no access to the field definition or interpolator configuration inside the getVisionImages() method. Take a look at my MR and let me know what you think. I believe passing either the interpolator configuration or field definition should be enough to derive the chosen image style to encode. Unfortunately, this would require a change in almost every OpenAi plugin. There may be a cleaner way to handle this using the base class only and allow plugins to declare their image requirements once.

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    @alexandersluiter - That would be no problems, however as far as I can see the merge request is empty. Could you have a look at that, so I'm not missreading?

    I will think about it a little bit, because I think this is a reusable logic. The same problem existed when talking to D-ID and I made a ugly solution there with just resizing too large images using GD (see https://git.drupalcode.org/project/did/-/blob/1.0.x/src/Did.php?ref_type...).

    I'm wondering if its possible to build a helper function into the AiInterpolatorFieldRule class that is used by any plugin, so its a reusable pattern when needed.

    I'm currently heavily working on another contrib module in the AI space, but next week I will be able to have a look at it.

    As a sidenote there will be a mediacoding plugin as well that can take an image, audio or video and automize certain modifications into an image, audio or video. This would allow image field to image field modifcation using image styles. But this will take some time before its done.

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    I was going to push what I have last night but it didn't feel elegant. I'll push soon and ping you when I do.

    Lastly, I love what you've built here, thank you for building a solid foundation on Drupal for these game changing technologies.

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    @Marcus_Johansson Upon further inspection of the codebase, it looks like the plugin system is put together abnormally compared to a regular Drupal plugin system. It seems that the declaration class in the Annotation namespace is being used as the base class for all plugins. The plugins themselves should not extend the \Drupal\Component\Annotation\Plugin class as it is supposed to be used for declaration only. Ideally, all plugins should be built from the \Drupal\Component\Plugin\PluginBase class. This way all plugins can access their definitions, runtime configuration, and more.

    I believe the best route forward is to do a simple implementation for image styles for the time being, and do a larger refactor into the more traditional Drupal plugin system that extends the \Drupal\Component\Plugin\PluginBase class. I would be willing to help with this if you'd like.

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    @alexandersluiter - I was actually going to start to refactor a lot of stuff. I've been learning a lot of Drupal's API systems while I worked on this, so its a little bit off a mess at places, that I've understood afterwards.

    In the case of the annotation system it anyway needs to be refactors to attributes for D11 and could already be launched for all systems running on 10.2, so its as good of a time as any.

    Many of the file generating rules have the same code all over the place for generating images, files and medias etc. This should be added into a helper service or a base class it extends, since I do not know yet how many helper functions there will be it might make sense to do it in services. This would also help people to more easily be able to add more rules/plugins later.

    If you want to help out with this it would be awesome!

    More importantly for now, if you initially need the simple implementation do be able to continue with what you need for your website, could you just do a merge request whenever you have time and I'll look through it and merge it with dev if it looks good. It would then hopefully be easy to move that piece out into some reusable helper service.

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    I have pushed the "simple" change in order to get image styles functioning as is. Take a look and let me know what you think.

  • Pipeline finished with Success
    9 months ago
    Total: 243s
    #129041
  • Pipeline finished with Success
    9 months ago
    Total: 143s
    #129264
  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    @alexandersluiter - It looks good so far - I saw you just removed the dpm.

    Regarding logger, this is something else to that I will have to refactor everywhere. I added Exceptions and thought I would log that globally, but I have come to realize that the logging function has a lot more to offer then what you can get from the Exceptions unless you plan them well, since they offer warnings etc.

    Let me know when you are finished with the MR and I'll merge it.

    For the refactor I could start on something, but it would be great to have a second set of eyes on it all the way if you would have that time?

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    I just created an issue πŸ“Œ Drupal Standards and Production Ready Active on the main module to start a refactor in a new 2.x branch. I think that's the best route. You've built up quite an awesome set of modules so far, I think getting it to Drupal best practices would really help for the module family's longevity and stability. My company is diving head first into generative AI integration, we are willing to help build this ecosystem out.

    @Marcus_Johansson What timezone are you in? Maybe a conversation off the issue queue would be a good start to a 2.x path?

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    Thanks @alexandersluiter - I'm on GMT+1, but I'm mostly doing it on my spare time anyway, so if you are US based I should be available early to midday depending on if you are west or east coast.

    You could mail me on me@marcusmailbox.com or I'm here on Drupal Slack, to setup something: https://drupal.slack.com/archives/D05SV6BJE7L

    I'm currently on vacation this week, but available from Thursday next week.

    If you have the time and haven't seen it yet, you could check this: https://www.youtube.com/watch?v=xpsFk3tzxwQ&ab_channel=DrupalAIVideos

    It's a longass video, but it gives some context on the "next" step for the AI Interpolator outside of the automation chaining. The first few minutes should give you some idea anyway.

  • πŸ‡ΊπŸ‡ΈUnited States alexandersluiter

    @Marcus_Johansson My apologies, I was on vacation for a while as well. I just pinged you on Slack, let's chat there on how I can help move things forward.

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    Hey @alexandersluiter, I actually needed this in another project now, so there is a reusable helper function for this in the core module now.

    Related ticket: https://www.drupal.org/project/ai_interpolator/issues/3446771 ✨ Add helper class to base fields to add imagestyle preprocessing Active

    If you pull this DEV version of this and the core module AI Interpolator it should be possible to test.

  • Status changed to Needs review 7 months ago
  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    As for anyone wanting to test this, I think this is an easy approach without having to muck around in code.

    1. Install Token module.
    2. Generate an image field.
    3. Generate a long text field.
    4. On that check AI Interpolator, choose gpt-4o for instance and assign the image field (but not yet imagestyle field).
    5. In the prompt choose advanced mode and in the prompt do "Describe in one sentence what is happening in the image".
    6. Upload an image with a lot happening at different parts of the image.
    7. Create a content with the above - make sure that the description is correct.
    8. Generate an image style that crops a very specific part of the image.
    9. Attach that image style to the settings of the text field.
    10. Create another content with the same image, and make sure that the description only describes what happens in the corner that should have been cropped out.
Production build 0.71.5 2024