Changes to AI Agents from Evaluations

Created on 25 November 2024, 27 days ago

Problem/Motivation

I think "Comments" should be Message history and there should be a consistent method of showing message history for the assistant vs Agent. (This might need a little agent refactor itself). We probably don't need "Task Name" "Task Description"? Unless we want to keep using those features so it works better with MiniKanBan in which case we should make the Assistant come up with a Task Name and Description.
I think we should always ask the Agent to respond with something and than also offer an explanation for their response. We should have a consistent format for the "Response Message" vs "Explanation"
I think it makes sense that we can't see the agent history for each previous user message. However I think we should at leave a record of some of the agents called so that we could query the agents called by that user message.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Active

Version

1.0

Component

Code

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @marcus_johansson
  • 🇬🇧United Kingdom MrDaleSmith

    How confident are we that if we ask the AI to give us an explanation for why it has done what it has done, that it will actually give us one?

    I'm nowhere near an expert on this, but my understand is that when you query an LLM it is generating a response based on the probability of the word it chooses to be next in the sentence being similar enough to ones that were used in similar questions in its context and training data? If you ask the AI for an explanation, would it need a degree of self-aware to have actually decided on a course of action and then be able to accurately explain its reasoning for choosing that response? Aren't you more likely to get back the most likely kind of response to the question "Why did you do that?"

Production build 0.71.5 2024