[Meta] Create an alpha version of evaluations used to test Drupal CMS

Created on 12 November 2024, 10 days ago

Problem/Motivation

The goal of the evaluations module is to be an overall framework to allow people to produce evaluations quickly and easily. However the immediate need is to evaluate the AI functionality in Drupal CMS. This module won't go in Drupal CMS so we can be really relaxed about the initial alpha/ pre-alpha as its just to be used for a small number of early test sites and we can then build it out into something that will work in more general use-cases.

This is part of the AI ecosystem but won't be in the module as may not want it on a live site (Although it will handle ongoing evaluations so may move into the AI module at some point)

Proposed resolution

  • We need the chatbot to behave OpenAI's, a thumbs up and thumbs down that appears when you hover over an AI response.
  • Behind the scenes we store the number of times the prompt has been used and the number of thumbs up and thumbs down we got, linked to a specific assistant. We record the date/time stamp of when this was made. (This will be anonymous)
  • We need some kind of GDPR notice, like a button "Save Chat history" and a link to a privacy page about it. Default it to off.
  • Also Behind the scenes we need to store the User prompt used, The whole history of the conversation, the AI output and whether or not its a thumbs up or thumbs down. The user can export the history at any point and it will create a CSV with a UID (but not username) and some randomly generated ID for the website as an anonymous identifier.
  • Also make it so that after a thumbs up or thumbs down a button appears to download that specific evaluation for that chat item (and all its history) in a CSV file that can be shared to help with debugging.
  • Allow Prompt Training Data Export - Submodule
  • We will create a another site that can store and display multiple evaluations in one place, though this is mostly relevant to the Drupal CMS
  • get evaluation page to appear as a tab/link/something that’s easy to find
  • remove thumbs up and down from the user messages in the chat window
  • add some kind of summary to the evaluations (total thumbs up and down maybe some other)
  • We should have an "Export Evaluation Button" that appears after its ticked? Or its there all the time for each individual one.
  • Overall export evaluations.
  • Evaluations need: UP, Prompt Chain, End Result, Yes/No, Notes and Title fields for admin view, Settings/ parameters for debug, tags
  • Question separately "Did it work, Would you keep using this?" - How can we find out if people would like it even if it didn't work?
  • Import eval into an explorer?
  • What happens to prompts that are removed from the history (We need that in the evaluation, but also know what was sent to the LLM vs not)
🌱 Plan
Status

Active

Component

Code

Created by

πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @yautja_cetanu
  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

    Possibly in addition to this.

    - Have a simple export that just says "Thumbs up / thumbs down" (Maybe which agents its done) that can be integrated with telemetry. Can monitor in aggregate if its going well.

Production build 0.71.5 2024