Media/ entity browser support

Created on 31 October 2023, about 1 year ago
Updated 3 November 2023, about 1 year ago

Problem/Motivation

In my case, I use a Media library and it would be great to have support for it when using this module to generate images.
In my case, I have a prompt field on my article node, which is interpolated from the article's body. This prompt field should then be used for the generation of a banner but that field is a Media reference so it can't be interpolated at the moment.

I think Media support is an important feature as it is widely used and also, since these images were all generated at a cost, it is desirable to keep track of them inside the library. Media entities would potentially allow the prompt that was used as well.

Steps to reproduce

Proposed resolution

One solution could be to leverage ECA for this. Perhaps if interpolator could introduce actions like "Run interpolator" on a field and pass tokens to it, it would be possible to set up something like this:

A node with a body field and a media reference field.
A media entity with an image field and a prompt field.
The image field is set to interpolate the prompt field and create an image.
The prompt field is a simple long text field set to interpolate {{ context }} into an image generation prompt.
An ECA model runs on presave and if the media field is empty. It creates the media entity and passes the body field into the {{ context }} and initiates the interpolation.
The image field interpolates automatically*

I have been thinking about a few other scenarios where ECA integration would be useful. Like:
Being able to use the batch interpolation option, and still be able to programmatically trigger Interpolation. We could have our cake and it too by having a good UX in case of manual action and also be able to Interpolate in bulk or fully automated.

VBO Interpolation

Conditional interpolation, such as creating models that trigger interpolation based on the value of another field (booleans, for instance, so we can present "Use AI" or "Generate an image using AI" controls to end-users).

Create multi-entity workflows, such as:
- Create a promotion post (node) using Interpolator for the (media)image(entity) and the body
- Automatically create a Commerce promotion entity, reference it, and use AI to set labels and descriptions based on the body of the node.
- Automatically create a Commerce Coupon entity and reference it on the Promotion entity, and use the body of the node to generate the coupon code value.

- E-mail a list of users by executing a views query.

.... Etc.

I think it's interesting to consider :)

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.0

Component

Code

Created by

🇹🇭Thailand AlfTheCat

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @AlfTheCat
  • 🇩🇪Germany marcus_johansson

    Hi and thank you for your great ideas!

    So some of the things I have already though about and some of them kind of works already - I did not give VBO a single thought and that is a very interesting concept.

    For multi-entity workflows, there are some private solution we have in our company, but they are only customer interpolators. For instance generate a landing page of different paragraphs using a base prompt on a node. This prompt then figures out which paragraphs to create of which types and which prompts each should be given and then those paragraphs have their own interpolator rules. So basically we take a product PDF and ask it to create a promotion landing page for that product and then a couple of minutes later its there. This we could in theory share, but they are so custom that it makes little sense as a general solution. I have thought of some general solutions for this, but haven't had the time yet.

    Conditional interpolation can kind of work with the token module and a prompt, but it doesn't work on all cases and on cases without prompts, so trigger based interpolation should be done as well.

    ECA is probably in the pipeline as well, I have to learn more about the module to be honest.

    Regarding media entities, you can actually create them, kind of. There are one thing you can do already and one thing I could add very easily.

    1. You can add a prompt field to the actual image media entity and make it possible that way.
    2. If you look at the Pixabay modules (free API btw) you can see that I already do a solution where you can interpolate an image media field. The problem with this is of course how to fill out other meta data that someone might. But at least this would add the image into the media field, which is the most important thing. I will add this to make it consistent with the Pixabay module (I just added it there because the parent Pixabay module worked with medias)

    Some other ideas that is in the pipeline for the future, since you seem interested :)
    The idea is also that it could store states, since you might not want to keep some middle-part content of a chain as data in the database always. For instance get this website and write a summary for it, you might now want to keep the actual webiste html in your database (and it might be illegal). Then have a trigger that this job was run and does not need to run again and it can be empty.

    Also stack interpolations - for instance generating search words for Pixabay sometimes leads to search words with no response. In that case it would be good to have a second rule that it should generate a Dreamstudio image for instance, so its not left empty.

    Also do retries - OpenAI (GPT-3.5 specifically) sometimes does not answer in the JSON format the interpolator assumes and fails, it should be able to retry it for stability and since its cheaper and sometimes faster to run 10 queries to 3.5 vs 4.

    Multi-field-output - I have a solution for the custom fields module already that I'm waiting ok from my company to be allowed to push. But the real issue is rather to for instance do one query to OpenAI and ask for 10 different things and fill out 10 different fields. This is faster and saves money of course. But I don't have a good solution for it yet.

  • 🇹🇭Thailand AlfTheCat

    Hi Marcus, thanks for the detailed response, much appreciated because I am indeed very interested!

    I've been doing a lot with AI using the available tools and so far Interpolator is by far the best. It's very interesting to see the modules you are introducing and I like exploring them.

    I think a number of things that you mentioned would all be solved by ECA integration. I have no idea how difficult that would be, but it will come in handy in a lot of scenarios, I'm sure.

    Regarding the media entities, if the media entity can access field data through tokens of its host, then ECA can already create the media entity if there is no media uploaded by the user, and then the interpolation could run on the media entity using field data of the host entity. If not then ECA can copy field data from the host node to the media entity and have it available for interpolation that way.

    On the returning of unexpected formats, I have experienced that too. I think it happened because I had first saved a prompt in Base more, and then switched to Advanced mode and also saved a prompt using tokens. The issue cleared when I cleared one of them. I also asked GPT-4 to return HTML and it refused, saying I was making a contradictory request to receive the response in JSON and HTML.

  • 🇱🇹Lithuania mindaugasd

    @AlfTheCat a new dedicated issue about ECA Is it integrated with ECA? Active

  • 🇱🇹Lithuania mindaugasd

    these images were all generated at a cost, it is desirable to keep track of them

    For costs tracking it would be nice to have a dedicated module: #3390625-2: Debugging / logging feature for prompt engineering

Production build 0.71.5 2024