The whole index gets cleared/deleted when any change in the search index configuration is imported/synced

Created on 13 June 2022, about 2 years ago
Updated 17 June 2024, about 9 hours ago

Problem/Motivation

Having set up an index with lots of content, any minor update to the search index configuration deletes the whole index. I would expect the content in the index to be marked to be reindexed at most. Deleting everything would need a manual index trigger and also a downtime in search while the content is not fully indexed.

Steps to reproduce

- Create an index
- Index content
- Export the search index, make a minor change like changing a label etc
- Import the search index and all your indexed content will be cleared

Proposed resolution

Mark the content for reindexing but not delete/clear it completely.

Remaining tasks

Check and create a patch.

User interface changes

-

API changes

-

Data model changes

-

๐Ÿ› Bug report
Status

Fixed

Version

3.0

Component

Code

Created by

๐Ÿ‡ฎ๐Ÿ‡ณIndia varshith

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Just coming back to this, we can't change or remove field mappings on existing indexes, we can only add additional field mappings.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    Also running into this, even just a minor change to an index config that could simply be reindexed over the existing content means the index is cleared during deployment.

    Index::postSave() already has some logic to handle this problem, can we just rely on this? Or at least detect the situation where fields are changed/removed and only clear in that case?

            // React on possible changes that would require re-indexing, etc.
            $this->reactToServerSwitch($original);
            $this->reactToDatasourceSwitch($original);
            $this->reactToTrackerSwitch($original);
            $this->reactToProcessorChanges($original);
    
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    A method we have tried to use to work around this is blue/green deployments; we have two identical indexes, and when we need to make a change we write to index 2 before switching reads to use index 2, and then swap over again next time around. It would be nice if this could be supported natively but not sure if this would be better off in Search API itself or in here.

  • First commit to issue fork.
  • Pipeline finished with Failed
    4 months ago
    Total: 191s
    #90081
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    10:44
    5:01
    Running
  • Pipeline finished with Failed
    4 months ago
    Total: 161s
    #90083
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    14:19
    5:01
    Running
  • Pipeline finished with Failed
    4 months ago
    #90097
  • Status changed to Needs review 4 months ago
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    7:50
    5:01
    Running
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States robphillips

    Took a crack at the problem. Search API Solr offers up a similar solution. Re-index is only triggered when a indexed field has changed. Tested with both in-site configuration changes and using configuration sync. There might be some edge cases. Definitely needs more eyes on it.

    https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin...

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Hiding the patch because we are using MRs and Gitlab CI now.

  • achap ๐Ÿ‡ฆ๐Ÿ‡บ

    Thanks for the MR. Checking if fields have changed does seem like a good idea but I think a clear rather than a re-index will be necessary as alluded to in previous comments. From the Opensearch docs:

    If you want to create or add mappings and fields to an index, you can use the put mapping API operation. For an existing mapping, this operation updates the mapping.

    You canโ€™t use this operation to update mappings that already map to existing data in the index. You must first create a new index with your desired mappings, and then use the reindex API operation to map all the documents from your old index to the new index. If you donโ€™t want any downtime while you re-index your indexes, you can use aliases.

    See docs: https://opensearch.org/docs/latest/api-reference/index-apis/put-mapping/

    So if we could detect if a field doesn't yet exist then we could use the re-index flag, and if it does already exist then we would need to clear. However that could get complicated if one field is added and another changed etc.

    So just checking if any fields exist in new index vs original and only clearing then already seems like an improvement. However I think the clear would need to come before the call to updateSettings and updateFieldMapping

  • Pipeline finished with Success
    4 days ago
    Total: 198s
    #197761
  • Merge request !52Resolve #3285438 "The whole index" โ†’ (Merged) created by achap
  • Pipeline finished with Success
    4 days ago
    Total: 196s
    #197766
  • Status changed to Needs review 4 days ago
  • achap ๐Ÿ‡ฆ๐Ÿ‡บ

    I took a stab at this with MR#52

    It retrieves the mappings from the OpenSearch server and compares them with the local Drupal mappings after event subscribers have fired.

    I don't think comparing the original index fields to the new ones will work because the modified mappings after event subscribers have fired aren't stored anywhere and so it's not really possible to compare.

    The other thing to be aware of is the dynamic mappings in OpenSearch. If you don't explicitly provide a type it will guess based off the data that is indexed. That means that every time you click save even if nothing has changed it will clear the index (because there are no local mappings). This is mostly a problem if you create fields in processors etc and the fix is just to use the IndexParamsEvent to explicitly give your custom fields the correct types. I have found that if you incorrectly give a Drupal field that is a float for example a string type in Search API then OS will override it with the float type in the mappings once data has been indexed. That will also cause clearing rather than re-indexing. So you need to be careful to assign the correct type.

    Did not tackle settings (analysers etc), only field mappings and doesn't negate the need for a blue/green deployment but should definitely help reduce the amount of times you need to switch indexes.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    This MR will need to be rebased to go onto 3.x branch. It's currently on 1.x which is security fixes only.

  • Pipeline finished with Failed
    3 days ago
    Total: 203s
    #198440
  • Pipeline finished with Failed
    3 days ago
    Total: 205s
    #198441
  • Pipeline finished with Failed
    3 days ago
    Total: 247s
    #198472
  • Pipeline finished with Failed
    3 days ago
    Total: 248s
    #198473
  • Pipeline finished with Failed
    3 days ago
    Total: 186s
    #198480
  • Pipeline finished with Failed
    3 days ago
    Total: 311s
    #198510
  • Pipeline finished with Failed
    3 days ago
    Total: 339s
    #198509
  • Pipeline finished with Success
    3 days ago
    Total: 178s
    #198527
  • Status changed to Needs work 3 days ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    NW for feedback and test fails

  • Pipeline finished with Failed
    3 days ago
    Total: 251s
    #198548
  • Pipeline finished with Failed
    3 days ago
    Total: 254s
    #198549
  • Pipeline finished with Canceled
    3 days ago
    Total: 184s
    #198554
  • Pipeline finished with Success
    3 days ago
    Total: 1188s
    #198555
  • Status changed to Needs review 3 days ago
  • achap ๐Ÿ‡ฆ๐Ÿ‡บ

    Have updated the code based on feedback. Looks like that error was unrelated to my code but I was able to trace it to the search_api_item table not being installed in the SpellCheckTest so I added there.

  • Status changed to RTBC 3 days ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    This looks good to me. I'll leave it as RTBC in case anyone else has feedback over the next 24hrs.

  • Status changed to Needs review 3 days ago
  • achap ๐Ÿ‡ฆ๐Ÿ‡บ

    Setting back to needs review as I found a small edge case after the type hinting changes when the OS mappings are not set.

  • Pipeline finished with Success
    3 days ago
    Total: 253s
    #198590
  • Pipeline finished with Success
    3 days ago
    Total: 309s
    #198591
  • Pipeline finished with Skipped
    about 9 hours ago
    #200592
  • Status changed to Fixed about 9 hours ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia larowlan ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ.au GMT+10
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Committed to 3.x. Thanks!

Production build 0.69.0 2024