Allow all field meta data to be context if asked for

Created on 17 January 2025, 4 months ago

Problem/Motivation

We have been trying to be thoughtful on how to use up tokens when doing our tasks, so that the costs if using services becomes minimal and so that medium sized models like llama3.2-70b can run the tasks without instruction memory loss. Also trying to keep the wait time of an answer to as few seconds as possible

The problem is that there are queries that will just not be possible to answer without a large context. For instance the following question:

Compare the taxonomy fields on the Article node type to the Default comment type and see if they have the same vocabulary on any fields.

This question the agent will simply not be able to answer in one block, but will need the triggering orchestrator to split up into the following questions:

Get all entity references on Article node type and Default comment type.

Of the following list of fields, can you check which ones on Article node type and which ones on Default comment type will match on having the vocabulary.

The orchestrator for the chatbot is the Assistants API and its just not simply advanced enough to understand that it needs to do this, because as with the agents it needs to answer quickly. Minikanban would most likely do it correctly.

The reason why the agents can do it right away is that, while we use LLM to understand the unstructured content, it has determenistic routes it takes based on that - for safety and usability reasons. And in terms of questions this means:

  • There is one triage task that gets the context of entities on the site and forwards the question if the entity in the question exists. This is the triage task for everything. This also gives back information if the user is asking for one or more specific fields with known field names.
  • This is forwarded to a task to answer the question, where it gets the context of the fields and their field types, but not more. If a question however was about specific fields, it will also load field storage config and possible values, field configs and possible values, form display configs and possible values and view display and possible values. Possible values, meaning that if its a taxonomy field, it shows which vocabularies exists for instance, that you could attach.

The problem is that media fields or taxonomy fields, are all entity references and you don't know more information about them until you check the field storage and field configuration in this case.

Possible solutions

More context solution

We try to figure out in the triage task if we should load the extensive context for all fields and then give it back. However this means that if someone asks for "Show me all media fields on the website?" on a site with 100s of entities you might run out of input token space or end up with one request that costs $5+ in fees. More input tokens generally do not slow down the website that much.

We can perhaps do this and cap the amount of entities you can ask about.

Entity query/Context selection tools

We can have the question agent give back a recommended entity query of the context it needs access to, to answer the question. This would just lead to one extra step.

So the extra LLM call would figure out more granulary what context is needed to figure out the question and then trigger the question on that context and that context only. There biggest issue is that the LLM might not know or hallucinate exactly where the context is - you usually add context to make it hallucinate less!

Agent Loops

Currently we do not allow the agents to loop themselves or their tasks. The only time it reruns a task is if the JSON validation on the response fails.

What this means is that you would ask the agent to ask itself the question, until it has the full answer and pass on to another agent if needed. This is very popular in advanced agentic solutions. See for instance OpenAI swarm. But more complex to get right and keep safe.

Its also one of the solutions to the problem that we have that we have to have middle-size advanced models for the agents at the moment. With this you could ask one prompt per variable in the field you are trying to create "Are they asking to make this translatable", "Are they asking to make this required" etc.

The problem and why this was skipped for the Drupal solutions was that it costs to much when you use a service, it takes to long time, if we truly allow it all tools, it might go heywild with hallucinations and we still don't have a nice way of showing progress in the assistant/chatbot. See Add Javascript Orchestration for Chatbot/Assistant Active

Planning Agent

So, while the assistants api is just one pre-query LLM and one answer LLM, and has flaws for more complex orchestration, we can put another agent that helps the assistant to plan out the tasks based on what the agents actually can solve or not. It could also then advice when a task will end up taking long time (and cost money)

This means that the assistant at a first step would ask the planning agent to set out an plan, based on actual context and possibilities and then in a second step forward this to the right agents.

This step is something that will be worked on, outside of this specific issue. See Add a planning agent Active

🌱 Plan
Status

Active

Version

1.0

Component

Code

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024