- Issue created by @marcus_johansson
Currently we expose any output and any parameters as text in the chat history. This works with most agents, but there are use cases where you want to work with bigger data chunks, that only matter for specific tools and not the history of the chat.
Think of the following scenario:
You have a tool that can scrape websites and you have a tool that can extract links from a website and then a tool that can screenshot website. Your agents task is to scrape a webpage and take a screenshot of all the links on it.
In reality the final history after all loops looks something like this (system prompt is something verbose about the above):
User: Can you scrape https://drupal.org and take screenshots of all the external links
---------------------------------------------------------------
Assistant: I will start by scraping the website
Tool Usage: scrape(https://drupal.org) with tool_id 1
---------------------------------------------------------------
Tool: <html><head><title>Drupal.org</title></head><body>[Loads of HTML with links]</body></html>
Tool_id: 1
---------------------------------------------------------------
Assistant: I will now extract the links
Tool Usage: extract_links(<html><head><title>Drupal.org</title></head><body>[Loads of HTML with links]</body></html>) with tool_id 2
---------------------------------------------------------------
Tool: https://wordpress.org, https://joomla.org, https://dri.es
Tool_id: 2
---------------------------------------------------------------
Assistant: I will now screenshot the links:
Tool Usage: screenshot(https://wordpress.org, https://joomla.org, https://dri.es) with tool_id 3
---------------------------------------------------------------
Tool: file_id 1, 2, 3
Tool_id: 2
---------------------------------------------------------------
Assistant: I have tooken screenshots here they are for https://joomla.org <img src="url to 1">.....
This will work, however the problem with this is that if a website is maybe around 50k tokens, you have had 3 loops of 50k tokens twice at a price of 300k tokens - something that can be measured close to ~1USD/EUR depending on your provider and some provider doesn't handle that limit even.
And in this case the actual html is not even needed to be read by the LLM.
What instead if we could do:
User: Can you scrape https://drupal.org and take screenshots of all the external links
---------------------------------------------------------------
Assistant: I will start by scraping the website
Tool Usage: scrape(https://drupal.org) with tool_id 1
---------------------------------------------------------------
Tool: !artifact:1
Tool_id: 1
---------------------------------------------------------------
Assistant: I will now extract the links
Tool Usage: extract_links(!artifact:1) with tool_id 2
---------------------------------------------------------------
Tool: https://wordpress.org, https://joomla.org, https://dri.es
Tool_id: 2
---------------------------------------------------------------
Assistant: I will now screenshot the links:
Tool Usage: screenshot(https://wordpress.org, https://joomla.org, https://dri.es) with tool_id 3
---------------------------------------------------------------
Tool: file_id 1, 2, 3
Tool_id: 2
---------------------------------------------------------------
Assistant: I have tooken screenshots here they are for https://joomla.org <img src="url to 1">.....
All of a sudden we have the same results, but we saved around 300k tokens.
Active
1.2
Code