Handle threading and history with Assistants API and RAG

Created on 26 August 2024, 4 months ago
Updated 27 August 2024, 4 months ago

Problem/Motivation

Currently the Assistants API Chatbot does not handle history at all. If you type a question into the chatbot it goes through the same standard approach to RAG known as Naive RAG.

  • Takes the User prompt (UP) and does a vector search of relevancy on the UP to the index.
  • Vector database comes back with a list of chunks ordered by relevancy. We take the top X amount of chunks (such as 5)
  • LLM receives a pre-prompt created by the site admin, it contains the UP and Context (The list of chunks from the vector search)
  • User is provided an answer from the LLM
  • Any future questions are treated as a completely new UP. The system doesn't provide any history for context and will go through this whole process from scratch every time.

As a result we cannot support any of these features:

  • A user cannot ask a follow up question about that answer. But the question is from the LLM's own information no longer any RAG needed
    • For example, Lets say I want to ask a project browser how I could use AI to improve search. It recommends AI Search and says the word "embedding". I then ask "What is an embedding?" The current situation, the AI would simply do a new search on that question instead of just answer the embedding.
  • A user cannot ask a follow up question about that specific answer with the previous context.
    • For example, If we asked about a specific set of module I could use to solve a problem. I couldn't ask "When should I use webform or the core contacts module? Where it uses the previous information in the context that found them to answer this new question. Similarly if you ask "Why did you pick that module?"
  • I can't sometimes ask for questions that require a new RAG search and then ask questions that combine them.
    • For example. I ask "What modules are good for having products on my Site" I get commerce. I then ask what would help me search for those products on the site quickly "I get search api ai", I then say "I've installed Commerce and made a product, How can I make Search API AI know those products exist?" (Bad example because perhaps it doesn't need the full context of the first one, but it might because it might need information about Commerce from the Commerce module page and search api to answer it)
  • Then there are a series of advanced functionality
    • Having queries that use more than one RAG search on more than one DB for a sub-set of the question. (If I say "I want to create reddit in Drupal. An LLM first says, Well you need discussions, you need ratings, you need moderation. It would do a RAG search for each of those things)
    • Choosing which vector database to look at. (One might be a list of modules, the other documentation across all modules).

Proposed resolution

This is all contingent on the parent issue.

I can see three approaches:

  • SIMPLE - Every LLM always does a RAG Search. The threading provides the history of UP and Assistant responses and the most recent context. - The pre-prompt ignores the context if its not relevant. - This is the easiest to do but it means lots of expensive prompts when you pay per input token.
  • MEDIUM - LLMs speak to other LLMS - The LLM can choose instead of providing a response to the User, to create a prompt for another LLM to handle. (For for example if its an Agent that has to write some python and run it, it can send the response to the python instead, get a response and the next LLM can say "I did it for you and this is the answer"
  • ADVANCED - THERE IS A PROJECT MANAGER - There is a pre-prompt on maybe a small cheaper LLM that looks at the UP and decide if a RAG search is necessary or go straight to an LLM

However in both situations. How much of the context we do provide? Only the most recent? The most recent 2?

We also need to think about how much control we give to the site builder. What are the defaults? How easy is it to change them, will this confuse most site builders? The Open API Assistants module handles all the above but its a black box where it gets to decide.

Any other ideas of how to approach this? What are best practises?

Remaining tasks

  • These mean the front-end needs a processor as the answers might take a long time and time out.
  • We probably want some user feedback for each step (So for example if it does RAG go "I'm just looking up the answer for you". "I'm just sending things to the code interpretter, where I can see the exact prompt and response". (These could be user editable or provided by the agent/ function/ RAG"
  • In both situations there needs to be a "Pre-Pre-prompt" that is for us as module developers to provide that ADD to the site builder's pre-prompt to tell the AI how to pick between the functions/ agents/ RAG that is available to it and when to pick or not. Do we make this User editable? Maybe there is a default that the user can edit by clicking "Advanced Pre-prompt settings"
  • The problem with user editable is that to some degree it needs to be automatically created from the Agents available to it.
  • if we go down the route of the LLM agent or the LLM speaking to other LLMs, then it can ask for previous contexts if it needs it.
  • It's likely a lot of these approaches are going to be slow so we need to work on the UX to make it feel better to make it faster

User interface changes

API changes

Data model changes

🌱 Plan
Status

Active

Version

1.0

Component

AI Search

Created by

🇬🇧United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024