Create visible testing framework

Created on 15 May 2025, 21 days ago

Problem/Motivation

From Slack:

Things we need it to do: - Our new version, we don't need to actually write functional tests or unique tests to check if the agents have worked (Like an actual taxonomy term has been created). WE just need to check the tool was called and sometimes with specific parameters. This can all be created, written and maybe even run in a UI to help with Prompt Engineer and testing different models.
Below is changes to the above features (We should put it in an issue)
Drupal CMS Core Agents Tests - Chat one
Create a Test with a name and description
Write out the Version of Drupal you are testing (Drupal CMS? Module version numbers) - Can be obtained automatically from the environment
Can be enabled or disabled.
Every Assistant Chat message - test if it picks the correct Agent
We need a UI for creating a collection of chats (so there will be a history) and which agent it should find.
UI for selecting the "Agent Test" we want to add to the chat message.
Similar thing to the General Agents tests, but the input is created by an LLM not a model prompt.
Test does it call the correct tools.
Tests need to be nested.
In the Report, we need to say what models were used
General Agents Tests: - Prompt Engineering Tool
What to do
Create a Test with and name and description.
Write out the Version of Drupal you are testing (Drupal CMS? Module version numbers) - Can be obtained automatically from the environment. The gui can pick which model and provider
Pick an agent to test.
Insert model prompt (Which contains all the context it needs)
Select the Tools it should pick.
Select the Order the tools are picked. (Optional whether order is tested)
Tools can be picked more than once
Select the parameters it should pass to the tools where appropriate (Optional)
Success is the correct is picked with correct parameters.
Some way of re-running the tests and undoing everything it did
"We then need some concept of "History" that isn't the same as chat history." - We need its version of "History"
Notes:
These have to be tests that run in a real full environment, not unit tests
We also want them to be run as kernel tests
These can be enabled or disabled.
We can create a UI in Agent Explorer, where I can pick a specific test.
We could make a UI in AGent Explorer, so that when I run something, I can click button and it will create a test with all the parameters filled out.
Also record the time it took.
Complex Agents Test
The same as general agents but they can call other agents as tools which can call other agents as tools.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Active

Version

1.1

Component

Code

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024