- Issue created by @yautja_cetanu
- First commit to issue fork.
- π¨π¦Canada bisonbleu
The current MR preserves the current
Run Once
operation and route and adds a newChoose Model
operation and route, replicating the UX found in Test Group. - π¬π§United Kingdom yautja_cetanu
I think this isnt quite what I meant although it is also good for doing individual tests. What I was thinking was something like.
When doing Prepare Agent Test Group. It shows:
Test Group
Drupal CMS - Content type Agent|
Select the test group you want to run.Model
LiteLLM Proxy - Openai
Select the model you want to use for running the tests.Use a different model for LLM evaluations
Toggle (Default off)When turned on:
Evaluator Model:
LiteLLM Proxy
Select the model you want to use for evaluation the results of tests-----
When the toggle is turned on, the model that the tests use to run the tests and the model the tests use to evaluatioe the results where applicable can be different
- π¨π¦Canada bisonbleu
Oh⦠I was looking in the wrong direction⦠Let's try to untangle things.
Tests and Test Groups are run using either the default model (as set in the Ai Settings) OR a preferred model as selected in the Prepare Agent Test Group Run form.
But this issue is about something else: when creating a test, at the bottom of the form, there a field labelled Agent Response LLM Test.
As the description reads:
If filled in, this prompt will be tested against the agents response.
This field makes it possible to evaluate the default/chosen LLM's response. For this it might be useful to select a different LLM, especially when the initial intent is to run test on a new or unknown model and then evaluating the results using a trusted model. - π¬π§United Kingdom yautja_cetanu
Yup! I noticed from the logs, that it was using 4o mini to run the LLM evaluation which I think was also the default model. It would also explain why it kept failing (because I think 4o mini wasn't clever enough)
- π¨π¦Canada bisonbleu
Looking at AI Logs, I can confirm that the default provider set for Chat with Tools/Function Calling is used for the evaluation; and this makes perfect sense.
Using the current MR and running the new Choose Model action and selecting a different model for running a test clearly illustrates this.
Alright, now I know where I'm going with thisβ¦
- π¬π§United Kingdom yautja_cetanu
Hmmm then instead of the Boolean maybe we should make the second drop down default to the default provider as it is now ?
- π¨π¦Canada bisonbleu
Attaching a Mermaid flowchart of the workflow for clarity.