- Issue created by @yautja_cetanu
- 🇬🇧United Kingdom yautja_cetanu
- Evaluations need to store the Agents that are working behind the scenes behind a assistant message history in details, with all the history and prompts and back and forth that exists there.
- For each "thing" in the evaluation (like message) I should be able to click a button and see the actual POST message sent to the LLM (maybe in the logs).
- We also want to store the "Drupal bit" the logs of the things Drupal has done between the agents working so we can trace.
-
justanothermark →
committed 3bcd6488 on 1.0.x
Issue #3487485: Add AiEvaluation entity view route and update listing to...
-
justanothermark →
committed 3bcd6488 on 1.0.x
- 🇬🇧United Kingdom yautja_cetanu
There are things I'm finding difficult to understand with the reports:
Mostly UX changes
- I think there should be a clear place for the "System" prompt vs the actual prompt that is happening.
- This is especially true for Agents, the specific question vs system prompt needs to be seperate with the system prompt being hidden in a collapsed thing. I really need to see the prompt given to the Agent and its response.
- Can we do anything about the formatting of the responses? Hard to read what is going on.
- Need to figure out how we can highlight the specific thing being evaluated, the specific prompt/ prompt to solve. I'm getting a lot of the history but I want the specific think to be evaluated. (Maybe highlight the specific message where it was ticked yes or no) and have that open by default?
- It seems the thing that says "Prompt" is really the "System Prompt" and the thing that is called "Question" is "Prompt" - It's kind of a User prompt (even if the user is an Agent). Maybe Input Prompt is good?
- I think we need to be able to see the most important things first with everything else in a drop-down. It's the User prompts + Response that matters the most.
- Similarly configuration such as model should again be in a dropdown.
- In the screenshot without agents, it doesn't show the Drupal_agent_assistant, why does it sometimes for message history and other times it doesn't?
- I think it might be better to show a UI of the specific user message we're dealing with and its history of agents seperately from the general message history as we can't see any of the agents within historical user messages anyway so having a drop-down for each thing doesn't make much sense.
May require a refactor of Agents
- I think "Comments" should be Message history and there should be a consistent method of showing message history for the assistant vs Agent. (This might need a little agent refactor itself). We probably don't need "Task Name" "Task Description"? Unless we want to keep using those features so it works better with MiniKanBan in which case we should make the Assistant come up with a Task Name and Description.
- I think we should always ask the Agent to respond with something and than also offer an explanation for their response. We should have a consistent format for the "Response Message" vs "Explanation"
- I think it makes sense that we can't see the agent history for each previous user message. However I think we should at leave a record of some of the agents called so that we could query the agents called by that user message.
Screenshots
Evaluations with no Agents called
Evaluations with sub-agents called
- Merge request !9Issue #3487485 by justanothermark: Added Notes & Tags fields, updated reports and reorganised path structure. → (Open) created by justanothermark