Handle threading and history with Assistants API and RAG

Open on Drupal.org →

Created on 26 August 2024, 11 months ago

Problem/Motivation

Currently the Assistants API Chatbot does not handle history at all. If you type a question into the chatbot it goes through the same standard approach to RAG known as Naive RAG.

Takes the User prompt (UP) and does a vector search of relevancy on the UP to the index.
Vector database comes back with a list of chunks ordered by relevancy. We take the top X amount of chunks (such as 5)
LLM receives a pre-prompt created by the site admin, it contains the UP and Context (The list of chunks from the vector search)
User is provided an answer from the LLM
Any future questions are treated as a completely new UP. The system doesn't provide any history for context and will go through this whole process from scratch every time.

As a result we cannot support any of these features:

A user cannot ask a follow up question about that answer. But the question is from the LLM's own information no longer any RAG needed

For example, Lets say I want to ask a project browser how I could use AI to improve search. It recommends AI Search and says the word "embedding". I then ask "What is an embedding?" The current situation, the AI would simply do a new search on that question instead of just answer the embedding.

A user cannot ask a follow up question about that specific answer with the previous context.

For example, If we asked about a specific set of module I could use to solve a problem. I couldn't ask "When should I use webform or the core contacts module? Where it uses the previous information in the context that found them to answer this new question. Similarly if you ask "Why did you pick that module?"

I can't sometimes ask for questions that require a new RAG search and then ask questions that combine them.

For example. I ask "What modules are good for having products on my Site" I get commerce. I then ask what would help me search for those products on the site quickly "I get search api ai", I then say "I've installed Commerce and made a product, How can I make Search API AI know those products exist?" (Bad example because perhaps it doesn't need the full context of the first one, but it might because it might need information about Commerce from the Commerce module page and search api to answer it)

Then there are a series of advanced functionality

Having queries that use more than one RAG search on more than one DB for a sub-set of the question. (If I say "I want to create reddit in Drupal. An LLM first says, Well you need discussions, you need ratings, you need moderation. It would do a RAG search for each of those things)
Choosing which vector database to look at. (One might be a list of modules, the other documentation across all modules).

Proposed resolution

This is all contingent on the parent issue.

I can see two approaches:

Every LLM always does a RAG Search. The threading provides the history of UP and Assistant responses and the most recent context. - The pre-prompt ignores the context if its not relevant. - This is the easiest to do but it means lots of expensive prompts when you pay per input token.
There is a pre-prompt on maybe a small cheaper LLM that looks at the UP and decide if a RAG search is necessary or go straight to an LLM

However in both situations. How much of the context we do provide? Only the most recent? The most recent 2?

We also need to think about how much control we give to the site builder. What are the defaults? How easy is it to change them, will this confuse most site builders? The Open API Assistants module handles all the above but its a black box where it gets to decide.

Any other ideas of how to approach this? What are best practises?

Remaining tasks

User interface changes

API changes

Data model changes

🌱 Plan

Status

Active

Version

1.0

Component

AI Search

Created by

🇬🇧United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.

Sign in to follow issues

Comments & Activities

Issue created by @yautja_cetanu
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Ok thinking about it more the "Pre-Pre-System-Prompt" - needs to be put together by our system. If we want the end user to make changes they could write "When to use this service" as configuration on the assistants config page that lets you pick the services.
Comment 11 months ago →
🇬🇧United Kingdom yautja_cetanu
Comment 10 months ago →
🇬🇧United Kingdom yautja_cetanu
Basic messages:

If they choose the RAG service.

- "I'm looking up your answer" - (when the LLM responds saying it wants to do a RAG search)
- "I'm now analysing the results and preparing my response" (When we sent the new prompt with context to the LLM)
Comment 9 months ago →
🇬🇧United Kingdom scott_euser
Status changed to Closed: outdated 6 months ago8:16pm 22 January 2025
Comment 6 months ago →
🇩🇪Germany marcus_johansson

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024