Flush index if Embedding Strategy is set

Created on 17 July 2024, 2 months ago
Updated 14 September 2024, 6 days ago

Problem/Motivation

If the Embedding Strategy is changed after it has things in the index, it should be emptied so it has to rerun the indexing.

🐛 Bug report
Status

Needs review

Version

1.0

Component

AI Search

Created by

🇩🇪Germany Marcus_Johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Marcus_Johansson
  • 🇬🇧United Kingdom seogow

    I believe index should not allow strategy to be changed when it was created. Index is a data structure defined by both strategy and embedding vectors. If something needs to change, a new index should be created.

  • 🇩🇪Germany Marcus_Johansson

    But you have the same issue in Solr for instance (and will have here) when you add or remove a field.

    We probably will never have an index with 20 fields like you might have in Solr, but it sucks to have to recreate everything, insetad of just having to re-index everything.

  • 🇬🇧United Kingdom seogow

    The difference between VDB and Solr (when only Keyword search is allowed in Solr) is that Solr will still work the same, if you add/remove a field. The field will just be returned as NULL if not populated. on the other hand, the vector might not just have different dimensions number (hard fail), but also different handling of returned items (soft fail in postprocessing in strategy).

    However, I see your point with being able to choose different strategy for the same setup (e.g. during development for testing strategies' outcome). For that case, the index can be emptied (deleted/recreated) in VDB backend, whilst the index configuration stay (the deleting/recreating happens during saving changed index).

    TYhis is a destructive operation and we must issue a pop-up warning before allowing saving the changed index.

    Adding/removing fields in that case doesn't cause any issues - they will be taken care of during new indexing.

  • Status changed to Needs review 14 days ago
  • 🇩🇪Germany Marcus_Johansson

    Just going through bug backlog - I agree with you here, this will be disabled on edit. a pull request exists here: https://www.drupal.org/project/ai/issues/3462030 🐛 Flush index if Embedding Strategy is set Needs review

  • 🇬🇧United Kingdom scott_euser

    Marcus I think your link links to this issue, I could work on this but sounds like we have a solution somewhere already?

  • 🇩🇪Germany Marcus_Johansson

    @scott_euser - sorry, I can't link correctly: https://git.drupalcode.org/project/ai/-/merge_requests/62. Its essentially just disabling the options if you are editing instead of creating.

  • 🇬🇧United Kingdom scott_euser

    Hmmm I wonder if its a valid use case to want to change the embedding strategy though? Perhaps instead we should have a warning message (drupal messenger) to say something like 'You have changed your embedding strategy, if you intend to keep this change you should requeue all items for re-indexing or you will likely have unexpected results.'

    As is in your MR I suppose its also okay, but we force the site builder to change the embedding strategy via config, import, and hopefully know that they should then re-index.

  • 🇬🇧United Kingdom scott_euser

    We could also actually queue all items for reindexing on change I suppose, but it could be that they change it and want to change it back immediately after seeing the warning...

Production build 0.71.5 2024