Decide where to manage configuration for Search API

Created on 21 June 2024, 3 months ago
Updated 26 August 2024, about 1 month ago

Problem/Motivation

Looking at search_api_ai, the embeddings are added as a data type, but the rest of the content indexes as usual via e.g. database, SOLR, etc.

Allowing this combination is probably still right as we can allow users to do a combination of exposed filters & vector database search, e.g. asking a question and caring only about results from blog post content type. Thoughts?

I can imagine a situation where the Vector Storage Plugin selected e.g. is SOLR and the search index is SOLR. The Vector Storage Plugin for SOLR can require selection of a search api solr index and then just grab configuration from that for example.

Or similarly the Vector Storage Plugin can be Pinecone, yet the standard indexing done into SOLR. The user then queries some combination of the SOLR index and embeddings. In this scenario however it gets complicated:

  1. Do we retrieve the topK embeddings first, then apply that as a condition e.g. to the SOLR query
  2. Do we run the SOLR query first, then apply that as a meta data filter into the vector storage database

I think it must be (1) otherwise each exposed filter option would need to be added as metadata as e.g. with Pinecone filtering by meta data is quite limited (https://docs.pinecone.io/guides/data/filter-with-metadata#querying-an-in...)

Proposed resolution

?

📌 Task
Status

Closed: outdated

Version

1.0

Component

AI Search

Created by

🇬🇧United Kingdom scott_euser

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @scott_euser
  • Assigned to seogow
  • 🇬🇧United Kingdom seogow

    Combination of indexes should be IMHO separate module. There is a Drupal 7 module Search API Multi-Index Searches and it calls for Drupal 10 for sure.

    However, I believe combining search indexes is out of scope of the AI module?

  • 🇬🇧United Kingdom scott_euser

    So what do you think is in scope? Current search_api_ai module does the mix as a data type

  • 🇬🇧United Kingdom seogow

    The scope is to conduct a Vector search at indexed embeddedings and provide it as a Search API plugin, so the developer can:

    1. Create a Vector Search Index.
    2. Conduct a search on the index via Views.

    If there are limitations as to what Vector Index can do, similarly to Search API Solr module, these should be reflected upon rendered View.

    Reasoning: This module's aim is to work out of the box and be plug-and-play replacement for any other Search API index technology. However, there is no limit on implementation of Vector search:

    • It can be combined with keyword search to provide even more accurate results.
    • It can work as pre-selection/ordering tool of entities and then be fed to reranking.
    • ...

    But that are advanced implementations, which might (or not) be implemented here, after when 1.0 is out.

    I hope the above makes sense.

  • Status changed to Closed: outdated about 1 month ago
Production build 0.71.5 2024