Character limits are not respected / Field level instructions

Created on 1 October 2023, over 1 year ago

Problem/Motivation

On many occasions, OpenAI returns a full set of paragraphs as a translation of a node's title field which causes a MYSQL error and prevents auto-accept. The same happens in other cases where, for instance, I have a field that holds an SEO title tag that is limited to 255 characters.

In some way, it might actually be helpful to be able to add an instruction to the AI on a per-field basis, in addition to the general translation instruction because some fields seem to prove problematic in other ways, and with AI being unpredictable by nature, having added control would be a great solution to various use cases.

Steps to reproduce

Create a node and type a single word, like a brand or an object. Often, AI will return a description of the item instead of a translation.

Proposed resolution

Add an input field to the OpenAI UI where site builders can insert a list of field names and instructions. Like:

field_title|This is a title, limit the result to 255 characters and translate literally.
field_body|Return as HTML
field_seo_title|Limit to 155 characters, this is an SEO Title tag.

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.0

Component

Code

Created by

🇹🇭Thailand AlfTheCat

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @AlfTheCat
  • 🇬🇧United Kingdom scott_euser

    I think we can auto-retrieve that information. It looks like the manual job processing does it here https://git.drupalcode.org/project/tmgmt/-/blob/8.x-1.x/src/Form/JobItem... so I think the steps here would be to figure out how to build that form via the job info to extra field type + max length + other settings as needed, passing those as instructions for the field.

  • 🇬🇧United Kingdom scott_euser

    Shifting this over to https://www.drupal.org/project/ai_tmgmt to work on. Thanks!

  • 🇬🇧United Kingdom scott_euser

    Hmmm its an interesting idea. We do make at least 1 AI call per field (multiple if the text gets chunked). I don't see how to determine which field is getting processed though from the JobItem, tried looking at getFieldDefinitions() and similar methods but no joy.

    It is by the way easier to explore if you start from 🐛 Reduce redundant batch runner Active branch there (until it gets merged). This is because the 1.0.x branch does not contain JobItem, only contains Job when process each batch operation.

  • 🇹🇭Thailand AlfTheCat

    Hi @scott_euser, very excited about all the work you've been doing. Looking forward to all of the good stuff this module will have to offer :)

    One question, the ai_translate module has a pretty neat interface for specifying prompts per language. Would that be something ai_tmgmt could leverage? Some specific instructions per language would be very useful to have, apart from the field-level instructions from this issue.

  • 🇬🇧United Kingdom scott_euser

    Hmmm it's possible yeah, maybe you can make a seperate issue for it. It'd be handy to override per language (or fallback to generic default if not overridden)

  • Status changed to Closed: won't fix 18 days ago
  • 🇬🇧United Kingdom scott_euser

    Unless someone can find a way to programmatically get the limits from all fields and make that available via TMGMT core (since its the business of TMGMT to be able to pass that info on to translation providers), I would be happy to help, I just cannot see a way myself unfortunately.

    Feel free to re-open if you can find a way, or feel free to move to 'Postponed' if you can find or create an issue in TMGMT requesting this info.

    For now consider adding something like this to your prompts "If the content length is close to 255 characters, ensure the translation is less than or equal to 255 characters as well. Apply the same approach if the content length is close to 50, 100, or 1000 characters." and adapt such an instruction to your configuration.

Production build 0.71.5 2024