Add artifacts to agents

Created on 6 June 2025, 3 months ago

Problem/Motivation

Currently we expose any output and any parameters as text in the chat history. This works with most agents, but there are use cases where you want to work with bigger data chunks, that only matter for specific tools and not the history of the chat.

Think of the following scenario:

You have a tool that can scrape websites and you have a tool that can extract links from a website and then a tool that can screenshot website. Your agents task is to scrape a webpage and take a screenshot of all the links on it.

In reality the final history after all loops looks something like this (system prompt is something verbose about the above):

User: Can you scrape https://drupal.org and take screenshots of all the external links
---------------------------------------------------------------
Assistant: I will start by scraping the website
Tool Usage: scrape(https://drupal.org) with tool_id 1
---------------------------------------------------------------
Tool: <html><head><title>Drupal.org</title></head><body>[Loads of HTML with links]</body></html>
Tool_id: 1
---------------------------------------------------------------
Assistant: I will now extract the links
Tool Usage: extract_links(<html><head><title>Drupal.org</title></head><body>[Loads of HTML with links]</body></html>) with tool_id 2
---------------------------------------------------------------
Tool: https://wordpress.org, https://joomla.org, https://dri.es
Tool_id: 2
---------------------------------------------------------------
Assistant: I will now screenshot the links:
Tool Usage: screenshot(https://wordpress.org, https://joomla.org, https://dri.es) with tool_id 3
---------------------------------------------------------------
Tool: file_id 1, 2, 3
Tool_id: 2
---------------------------------------------------------------
Assistant: I have tooken screenshots here they are for https://joomla.org <img src="url to 1">.....

This will work, however the problem with this is that if a website is maybe around 50k tokens, you have had 3 loops of 50k tokens twice at a price of 300k tokens - something that can be measured close to ~1USD/EUR depending on your provider and some provider doesn't handle that limit even.

And in this case the actual html is not even needed to be read by the LLM.

What instead if we could do:

User: Can you scrape https://drupal.org and take screenshots of all the external links
---------------------------------------------------------------
Assistant: I will start by scraping the website
Tool Usage: scrape(https://drupal.org) with tool_id 1
---------------------------------------------------------------
Tool: !artifact:1
Tool_id: 1
---------------------------------------------------------------
Assistant: I will now extract the links
Tool Usage: extract_links(!artifact:1) with tool_id 2
---------------------------------------------------------------
Tool: https://wordpress.org, https://joomla.org, https://dri.es
Tool_id: 2
---------------------------------------------------------------
Assistant: I will now screenshot the links:
Tool Usage: screenshot(https://wordpress.org, https://joomla.org, https://dri.es) with tool_id 3
---------------------------------------------------------------
Tool: file_id 1, 2, 3
Tool_id: 2
---------------------------------------------------------------
Assistant: I have tooken screenshots here they are for https://joomla.org <img src="url to 1">.....

All of a sudden we have the same results, but we saved around 300k tokens.

Proposed resolution

  • First take a decision if artifacts is the correct naming - this is the naming in Langchain, but artifacts in Claude is something else.
  • Create an artifact interface with a get and set and id.
  • In the AiAgentForm for tools, make sure that we add a setting for the tools, that the output can be treated like an artifact.
  • In the AiAgentEntityWrapper make sure to store any output as an artifact after the tool is run.
  • In the AiAgentEntityWrapper make sure to replace any artifact with the real value.
  • Make sure to add the artifacts to the events.

Remaining tasks

User interface changes

API changes

Data model changes

✨ Feature request
Status

Active

Version

1.2

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @marcus_johansson
  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu
  • πŸ‡ΊπŸ‡ΈUnited States Kristen Pol Santa Cruz, CA, USA

    Switching to the correct tag

  • πŸ‡¨πŸ‡¦Canada b_sharpe

    I can likely work on this but a few things:

    • Artifact seems like the wrong term. "Tool Result", "Tool Context", "Function Call Result", etc might be more appropriate. (going to use artifact for the sake of clarity in the next points)
    • How does a tool decide to use the artifact or create a new one? Using the example above, let's say you want to do a new screenshot, how do we tell the tool to run again instead of re-using the data
    • Can there be multiple artifacts per tool per chat? Are artifacts tied to ThreadID?
  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    Awesome @b_sharpe!

    Artifact is a weird word, but looking into it, it seems to be the actual definition being used for this. Its came from Anthropic from the beginning and was kind of used actually now that I think about it for the same thing, but now its used elsewhere, see for instance: https://python.langchain.com/docs/how_to/tool_artifacts/ and https://google.github.io/adk-docs/artifacts/.

    CrewAI uses Artefact, but I guess that is just American vs British English. Since we use cspell with American English, we should keep the American version.

    Since we will be exposing tool and results via MCP and A2A to the outside world, we should use the same vocabulary as the rest of them for the data name at least - the visual name within Drupal we could make up our own.

    Regarding decision - I think the best course would be for now that this always happens because a human that sets up the agent knows if its needed or not. The decision is basically made out of four considerations as far as I can see it:

    1. Does the output of this tool only matter for other tools and not for the agent itself. If yes, it can be an artifact.
    2. Will the context of the tool be so large, that it affects the context length and starts creating hallucinations. Scraped websites for instance. If yes, it should be an artifact.
    3. Does the context itself include instructions - like a food recipe for instance - causing the agent to have instruction fatigue or even worse, change its course of actions. If yes, it should be an artifact.
    4. Is the output in such a weird format that the agent has a hard time understand how to pass it along, even with a good system prompt. For instance a binary. If yes, it should be an artifact.

    You could in theory have subagents take these decisions, but I think with how we see that not all agents are reliable, it makes sense to make this a human decision.

    This is why the issue is in AI Agents, rather the the AI module as well. This should be another option you can set on the tool when you set it up. So you check a checkbox that says "Artifact Output" with a description when you should enable it.

    We have something called AiDataTypeConverter plugins in the AI module, that are tools that hook in while the tool parameters are being filled in to do something similar to route arguments upcasting, for anyone knowing routing. This makes it possible for instance to set a ContextDefinition type to entity, and then ask for the agent to answer "node:1", but in the tool, you get returned the actual object.

    We should use this to create a data type converter for artifacts that can look for a magic prefix, something like "!artifact:" and then upcast it. Since we use the "{word}:" for entity upcasting, I think the exclamation mark, or something else makes sense, if someone for some reason creates an entity called artifact.

    I will add this information to the issue, since this functionality was added since we wrote the issue.

    A tool can be run multiple times, and you can have artifact output on multiple tools, so it should be possible. It think for contextual understanding just naming them "!artifact:{function_name}:{incremental_number}" make sense. They will be given on tool result as well, so the agent should have no problems glueing it together, specifically with a good system prompt.

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    Did some changes to the main issue as well, check if it's understandable now, otherwise I'll try to scope it out more.

  • πŸ‡¨πŸ‡¦Canada b_sharpe

    Ok, I think that all makes sense, I'll take a stab at it and report back

  • πŸ‡¨πŸ‡¦Canada b_sharpe

    @marcus_johansson: I just noticed the option for tools:

    Return directly

    Check this box if you want to return the result directly, without the LLM trying to rewrite them or use another tool. This is usually used for tools that are not used in a conversation or when its being used in an API where the tools is the structured result.

    Do we see this as separate? Or should we be refactoring this to instead be the use-case for artifacts?

  • Assigned to b_sharpe
  • πŸ‡¨πŸ‡¦Canada b_sharpe

    Also, so far in my tests, it appears the exclamation mark is not a good placeholder as the AI providers discard it often thinking it's a typo, so I've switched to:

    {{artifact:$this->toolId:$this->index}}

    The problem now; however, is the AI provider doesn't know how to use this, so sometimes it won't select tool 2, other times it will just pass random data to it and fail. I'm not sure how the AI provider is going to return tool 2 as one of it's tool_calls if it doesn't know it has the proper data? I've tried adding some instruction like:

              $output = 'The tool output has been stored as an artifact placeholder ' . (string) $artifact;
    

    but this doesn't seem to help. Any thoughts?

  • πŸ‡¨πŸ‡¦Canada b_sharpe

    Just putting in the MR for visibility. I have not addressed the form item yet so it artifacts EVERY tool currently, but I wanted to point out the real issue here with unknown output as in my previous comment (#12) ✨ Add artifacts to agents Active .

    I've gotten a little further in which a single-run, mult-tool response is working and the artifact is getting set/replaced, but it's about 40% of the time. I truly believe the only way this is going to work is for tools to specify output context and then tools needing that context can know it's coming from there regardless of if the value is an artifact or not.

  • Merge request !170Resolve #3528726 "Add artifacts" β†’ (Open) created by b_sharpe
  • Pipeline finished with Failed
    16 days ago
    Total: 162s
    #568470
  • Pipeline finished with Failed
    16 days ago
    Total: 195s
    #568482
  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    Thanks for the updates @bryan - sorry about late responses, I'm on vacation at the moment, so things are moving a little slowly on my end.

    Return directly

    I think this is a separate thing. The idea with this is that when you know that any tools response is good enough to return to whatever consumer is using the agent you do not need the agent to produce a textual response or loop another loop to figure out if its done. An example of this for instance is if you have a field validation agent using a tool for validating fields and it has used that tool - you 100% know that this is the last thing it should do, so you stop and return there and then.

    Also, so far in my tests, it appears the exclamation mark is not a good placeholder as the AI providers discard it often thinking it's a typo, so I've switched to

    Great!

    I'm not sure how the AI provider is going to return tool 2 as one of it's tool_calls if it doesn't know it has the proper data? Are we expecting agent prompting to know this and reference the artifacts in the prompt?

    It can only know it using artifacts if its explained in the system prompt what the artifact will be used for and how or if we add in some way for the agent to temporarily read the artifact on on loop. In the end, this has to be on the person setting up the agent to write a working prompt for, something like (very simplified and will not work every time):

    You will have three tools at your disposal, one to scrape a website, one to summarize and one to put the summary into a card component.
    
    Its important that you do the following.
    
    1. Figure out if there is some website that we can scrape, if not just answer that you can't help unless they provide an actual website.
    2. Use the scraping tool to scrape each website given. The output will be given as an artifact with a unique token, instead of the full html.
    3. Summarize the websites using the summarizing tool. The input of the text to summarize should just be the unique token give by the website.
    4. Use the card component generation tool, to generate the component, create the title from the summary and use the summary as the description.
    

    As far I understand it, and it makes sense, the information in an artifact can never be of importance for the agents decisions or it can only be important if we have some way of adding it dynamically once per loop (if wanted). So the decision making in that case has to go into the system prompt. But I haven't explored the theme so extensively, so I could be wrong here.

    I truly believe the only way this is going to work is for tools to specify output context and then tools needing that context can know it's coming from there regardless of if the value is an artifact or not.
    

    Just to get an understanding, what does output context refer to here? Could you give an example of how it would work, or if you have time and believe its the way to go forward, try to test it, even outside of Drupal context to see if it helps?

    I'll start adding some comments into the code so far - it looks great!

  • πŸ‡¨πŸ‡¦Canada b_sharpe

    Ok, I'll go ahead with just assuming the user prompts will take care of it for now as you suggested and we can regroup after.

    RE: Output context, something like what is being done with Tool API β†’ where the tool defines what it needs and what it provides both with Context, that way we wouldn't need to rely so much on the prompting as the artifact would at least know what it represents

  • Pipeline finished with Failed
    14 days ago
    Total: 162s
    #570369
  • Pipeline finished with Failed
    14 days ago
    Total: 170s
    #570378
  • πŸ‡¨πŸ‡¦Canada b_sharpe

    Ok, I've added the form option now along with some instructions on how to use within the prompt. I tested with a multi-step function call and it is doing the replacement properly and using it in subsequent calls.

  • Pipeline finished with Canceled
    14 days ago
    Total: 76s
    #570384
  • Pipeline finished with Failed
    14 days ago
    Total: 154s
    #570385
  • Pipeline finished with Success
    14 days ago
    Total: 155s
    #570389
  • πŸ‡ΊπŸ‡ΈUnited States michaellander

    My understanding is artifacts are generally something that AI creates. In our case we also want pointers to things that may already exist and that we are modifying. Like if we ask AI to create a node, to me it's an artifact, if we ask it to load a node, is it still an artifact? Even if in both cases we intend to modify and save them. I just want to make sure we are using the correct terminology and would love to find some precedent somewhere.

Production build 0.71.5 2024