Add search VDB index lifecycle management

Created on 17 February 2025, 5 days ago

Problem/Motivation

For a `AiVdbProvider` implementation I'm working on, I struggled with index management. In my case I wanted to create a collection for each index, not per backend.

Currently, the plugin does not provide an interface when an index is added, edited or deleted.

Proposed resolution

I looked how the search api database implementation does stuff, which aligns a bit with my need. But I didnt want to create my own backend server implementation because I want to keep all the nice embedding stuff.

So I added a trait which provides the methods to act on index creation, update and deletion.

Remaining tasks

Discuss. Review.

User interface changes

None.

API changes

There are three new methods available which you can implement: `addIndex`, `updateIndex` and `deleteIndex`.
These are already added to the base class using a trait/interface.

Data model changes

None.

Feature request
Status

Active

Version

1.0

Component

AI Search

Created by

🇳🇱Netherlands askibinski

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @askibinski
  • Pipeline finished with Failed
    4 days ago
    Total: 367s
    #427293
  • Pipeline finished with Success
    4 days ago
    Total: 258s
    #427297
  • 🇬🇧United Kingdom MrDaleSmith

    Is there anything in this that is particular to providing an AI search? I'm just wondering if it can be achieved in custom code using the existing SAPI hooks or events, or if it's something that might be better suited to the main SAPI module.

  • 🇳🇱Netherlands askibinski

    Well, I wanted a more fine-grained control over collections. Currently the collection name is tied to the backend server configuration and all indexes use the same collection name. In my case, I wanted to be able to create a separate collection for each index belonging to the vector db server.

  • 🇬🇧United Kingdom scott_euser

    Thanks, have just been coordinating with the other maintainers to take a longer term view on this given the challenges aligning some VDB providers with the 2 levels of Search API (server + index) compared to sometimes 3 levels in VDB providers.

    We've generally agreed this is fine to add for 1.x branch as it gives more flexibility to how a VDB provider is lined up with Search API having the 3rd interface.

    I'll give this a more detailed play as soon as I can but we should be able to get it in in it's current form (or similar).

    Thanks!

  • 🇳🇱Netherlands askibinski

    @scott Thankt for the update, that makes sense, I'm perfectly fine with postponing this for a next version because there are other areas too which would need attention, for example doSearch which uses the backend configuration collection name and passes it to vectorSearch without any index context.

      /**
       * Run the search until enough items are found.
       */
      protected function doSearch(QueryInterface $query, $params, $bypass_access, &$results, $start_limit, $start_offset, $iteration = 0) {
        $params['database'] = $this->configuration['database_settings']['database_name'];
        $params['collection_name'] = $this->configuration['database_settings']['collection'];
    
  • 🇳🇱Netherlands askibinski

    Just adding a patch file here for reference so I can link to it from mysql vdb provider which requires it. It's similar to the MR but also patches the backend server to fallback to index name for collection (if not given in config) and server name as database name (if not given).

  • 🇬🇧United Kingdom scott_euser

    Actually we were thinking we could merge this into 1.x; right now AI Search is marked experimental so its a good time still to make any smaller tweaks before we mark it as stable. Do you want to consider what other changes you'd need then first and leave this as needs work?

  • 🇳🇱Netherlands askibinski

    Sure!

    The main other issue I had (beside the PR) is that in SearchApiAiSearchBackend the search methods have tied $collection_name (and database) params to the server config as for example in #7 above. For more flexibility it would be nice to be able to override this in the plugin if you want to (which isnt possible now). Only the filter param can be set in the plugin.
    But I think that could be a separate issue?

  • 🇬🇧United Kingdom scott_euser

    Sure up to you if you prefer as a separate issue; if its going to be merge conflicts I don't want to make extra work for you; will only become more strict with things outside of experimental :)

Production build 0.71.5 2024