- Issue created by @yautja_cetanu
- 🇬🇧United Kingdom scott_euser
Combine lexical + Vector search in one query (qdrant does this) (Note: BM25 is not the same as solr, etc). This is for when we provide context to an LLM, not just the boost with with AI functionality
Can you detail this one a bit more? Sorry I'm so used to the drupal terminology I don't know what this means in a practical sense. Also can't see anything in Qdrant docs that shows how its different from filtering in Milvus for example, so would be good to get pointed in direction there.
I think we also need to get someone who uses e.g. Mandarin or Japanese for example to work on 🐛 Broken Byte-Pair Encoding (BPE) Active in the embedding strategies.
- 🇬🇧United Kingdom scott_euser
Is Seogow working on the Qdrant side of things still? Might make it more clear
- 🇬🇧United Kingdom scott_euser
I have also been working on ✨ Improve the Search API admin UI for adding/editing fields Active which needs review. It should significantly improve the user interface for adding fields.
- 🇬🇧United Kingdom yautja_cetanu
That UI for Search API is soooooo much nicer!!!
- 🇬🇧United Kingdom yautja_cetanu
So the situation I'd like with AI Search + Chatbot is that I search for something and it does the traditional search, where it searches for the words as is and then also the semantic search and then automatically combines them.
I think its probably the case that your "Boost With AI" but providing it as Context to a chatbot would solve this. So its probably just taking what you've already done and making it so the LLM can use it as context, not just a view can use it.
- 🇬🇧United Kingdom scott_euser
Yeah the processor is in search api, not views, so executing search API can use the boost. And maybe boost is the wrong word as (at least with database) you can get results that traditional search failed to find - so it combines as you say (haven't thoroughly tested solr, might need tweaking).
As long as you are executing search api in a chatbot with that processor enabled you'll get what you're asking already.
- 🇬🇧United Kingdom scott_euser
Looking at the issue queue, I updated issue summary with what I think would be good to get sorted.
- 🇬🇧United Kingdom seogow
I went through all the tickets and have done/learned the following
Tickets mentioned in summary
- [Meta] Bring AI modules Tests, Error Handling and Documentation 🌱 [Meta] Bring AI modules Tests, Error Handling and Documentation Active has one unfinished child Add search api integration test 📌 Add search api integration test Active , which is reported by @scott_euser. I have asked for either implementing or assigning to me.
- SOLR 'boost' of results should find results that are not found by traditional SOLR search 📌 Check that SOLR 'boost' of results finds results that are not found by traditional SOLR search Active is reported by @scott_euser. I have asked for either implementing or assigning to me.
- Pass metric type to vector search function ✨ Pass metric type to vector search function Active is reported by @joshhytr. I have asked for either finishing implementation or allowing maintainers to finish it.
- Rename 'Boost' plugins to 'Boost and combine' plugins 📌 Rename 'Boost' plugins to 'Combine' plugins Active is reported and possibly also finished by @scott_euser. Asked for opening MR.
- Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes 📌 Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes Needs work is reported by and assigned to me, with PR comments by @scott_euser. I am going to implement these.
- [Meta] Features to focus on for v1.1 → - I have added a comment mentioning things which were done. Unless @yautja_cetanu wants to create child tickets, I suggest the remaining Features of Search module mentioned in the ticket as not required for moving out of experimental.
Other open bug reports
- Failure to insert into collection in Milvus 2.4+ with float32 instead of floatvector data passed 🐛 Failure to insert into collection in Milvus 2.4+ with float32 instead of floatvector data passed Postponed: needs info - asked to close or follow-up.
- Incorrect view mode configuration in RagAction::fullEntityCheck and hardcoded default value in Form 🐛 Incorrect View Mode Configuration in RagAction::fullEntityCheck and hardcoded default value in Form Active - merge request exists.
- Milvus-based server silently failing to insert to collection when indexing 🐛 Milvus-based server silently failing to insert to collection when indexing Active - stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
- Error while indexing 🐛 Error while indexing Active - stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
- Flush index if Embedding Strategy is set 🐛 Flush index if Embedding Strategy is set Needs review - possibly stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
- Non-UTF8 encoding results in the exception "Failed to insert into collection" 🐛 Non-UTF8 encoding results in the exception "Failed to insert into collection" Active - possibly stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
- VDB Provider only visible in Search API create Index form after cache clear 🐛 Pinecone not visible in seach_api only after cache clear. Active - asked @mrdalesmith & @scott_euser for heads-up and offered to investigate on Wednesday, 22 January 2025.
Suggested way ahead
I shall either create the fixes for PR of MRs, or close the tickets due to inactivity on dates provided above.
Hence
Assuming the work on stale tickets (where reporter or assigned developer do not communicate, they do not follow the recommendations, or ask us for help with finishing the ticket) start on Tuesday, 4 February 2025, it can be ready for PR on Friday 7 Ferbuary 2025.