Reindexing problems after removing config mapping

Created on 6 September 2023, over 1 year ago
Updated 8 September 2023, over 1 year ago

Problem/Motivation

As reported by @tgoeg after introducing 📌 Remove entity id mapping config Fixed we now run into an issue that existing indexes do not reindex properly. This is due to the fact that we changed the primary key from "id" to "search_api_document_id" but Meilisearch does not allow changing primary key of the index/document if the documents already exist in the index.

The users would have to delete the index entirely then reindex all items but deleting the index might delete associated configurations such as views that are based on that index which is not acceptable.

Steps to reproduce

  • Checkout revision 1b072d482e7e9c7bf4114166b09447a020438e74
  • Revert module schema to 9001 drush ev "\Drupal::keyValue('system.schema')->set('search_api_meilisearch', (int) 9001)";
  • Create a new index with Meilisearch backend
  • Index few items
  • Checkout latest 1.x-dev branch
  • Run drush updb
  • The index is scheduled to reindex
  • Run reindex and an error message appears
  • No documents are updated in Meilisearch
  • Recent log messages show error such as Document doesn't have a `id` attribute: `@"search_api_id":"entity:node/100:en","search_api_datasource":"entity:node","search_api_language":"en","search_api_document_id":"entity_node_100_en"`.

Proposed resolution

I think we will have to revert to using "id" field as primary key. Anything else would require deleting every document on the index, change the primary key and reindex all items which is not something I'd be willing to mess with - specially indexes that run in production environments.

This means we might have to go the hard way to prevent users from adding "id" - as was the initial plan in 🐛 Remove the possibility to add field with machine name id Fixed - to the index fields or make it so that our internal "id" field always takes priority and we should notify the user with a warning that the "id" field will be overwritten with the internal id.

Remaining tasks

🐛 Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

🇸🇮Slovenia bcizej

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bcizej
  • @bcizej opened merge request.
  • Status changed to Needs work over 1 year ago
  • 🇸🇮Slovenia bcizej

    Updated the UpdateHook9002Test.php test that reproduces the issue.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    82 pass, 15 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    108 pass, 1 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    108 pass, 1 fail
  • 🇦🇹Austria tgoeg

    I have a feeling we need to drop and recreate the index when upgrading from 1.0.0 to dev-1.0.x anyway. Not 100% sure about it, but that's what I did whenever I got strange results (as e.g. in 🐛 No results from file_extractor extracted data in Drupal 10.1 Closed: works as designed and some other scenarios not documented in the issue queue) and it mostly magically solved it. I could be wrong and the fixed issue 📌 Remove entity id mapping config Fixed (and my multi-version setup) was also to blame and things work out with the current codebase.

    It might even be easier for end users to automatically have them being recreated, as otherwise they would need to migrate the data over when switching Meilisearch versions (<1.0 to >1.0, if that even works?) but I have to look into this sometime. This module basically does everything now that's needed in Meilisearch anyway (no need to manually set off HTTP requests for sorts and filterable attributes, etc. anymore, yay!)
    The only thing remaining (and I think that's only me at the moment) is the dropped privilege API key setup I'd want to move over to a new Meilisearch server. Other than that, unconditionally recreating everything might be the safest approach that should guarantee a smooth upgrade.

    I'd consider 1.0.0 to only be something for the daring, so anyone really using it in production (is this anyone else but me?) should be OK to have a more elaborate update to what I'd probably call 2.0.0, if this eases future development and user friendliness. The above restrictions sound like too much of a hassle only for making the upgrade process easier for the few people that might use it in production already.
    Deleting the index and re-importing the config with drush is a matter of 2 minutes. I had upgrade paths that were a lot harder than this!

    Don't want to prevent you from making this bullet-proof, as this is certainly also legit from a quality PoV, but don't just do this because you know I (or a few others, but I can only speak for myself) have a production setup.

    (I am definitely grateful for the quality-oriented approach of all of you, not only in this regard. I don't want to water that down and I'd like to say a big thank you for this right here!)

  • 🇸🇮Slovenia bcizej

    I have a feeling we need to drop and recreate the index when upgrading from 1.0.0 to dev-1.0.x anyway. Not 100% sure about it, but that's what I did whenever I got strange results (as e.g. in #3385410: No results from file_extractor extracted data in Drupal 10.1 and some other scenarios not documented in the issue queue) and it mostly magically solved it. I could be wrong and the fixed issue #3384347: Remove entity id mapping config (and my multi-version setup) was also to blame and things work out with the current codebase.

    Yes the fixed issue was the culprit as we changed the primary key there, but Meilisearch does not allow that when documents already exist. Hence this issue was opened.

    It might even be easier for end users to automatically have them being recreated, as otherwise they would need to migrate the data over when switching Meilisearch versions (<1.0 to >1.0, if that even works?) but I have to look into this sometime.

    The above restrictions sound like too much of a hassle only for making the upgrade process easier for the few people that might use it in production already.
    Deleting the index and re-importing the config with drush is a matter of 2 minutes. I had upgrade paths that were a lot harder than this!

    This would be fine if Meilisearch indexes contain only data from Drupal and is most likely the case for 99% of the users, but if some custom documents were added from other sources (not in Drupal) that data would be lost and that is why I'm hesitant to go this route. As for migrating the data when upgrading Meilisearch - this is completely on Meilisearch side, there are upgrade guides on their official page but our module should work just fine. When I was testing 🐛 Invalid characters in filter values returns an error Fixed I had to switch versions and did data upgrades, this module still worked just fine with it.

    I mean for me as a developer the easiest approach would be to do nothing and just say "Hey users just delete everything and recreate" but that sets a bad precedence (imo) for the quality of the code, product etc.

    I'd consider 1.0.0 to only be something for the daring, so anyone really using it in production (is this anyone else but me?) should be OK to have a more elaborate update to what I'd probably call 2.0.0, if this eases future development and user friendliness.

    My thinking here is that once we fix all the critical issues (without upgrading meilisearch php and support for never versions) I would make a 1.1.0 release supporting the Meilisearch v0.28.1 then create a new 2.x branch where we start upgrading meilisearch version support and make a 2.0.0 release for the Meilisearch v1.3 version.

    I do have a few issues to add (from the top of my head I know at least one major issue exists with synonyms processor) that are blocking a new stable release, I will get to them when I find the time.

  • 🇦🇹Austria tgoeg

    Alright, understand.
    Adding in data from other sources might indeed come into play later here, as well. Maybe someone indeed did that already. Though that would definitely be something I'd only start to tackle after 1.1.0 or more likely 2.x for me.

    Note however that 🐛 No results from file_extractor extracted data in Drupal 10.1 Closed: works as designed occurred already weeks ago, before the new primary key/"id" column issue (only hit this on the forward escape route as I thought an update might help). I will try to look into this soonish, probably when this issue here is fixed. (i.e., try an update of a cloned production instance from my patched-but-mostly-1.0.0 version to the latest dev-1.0.x)

    I think dropping and recreating the index might still be necessary because of other aspects/bugs I cannot currently describe/reproduce any better. I'll let you know!

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    109 pass
  • Status changed to Needs review over 1 year ago
  • 🇸🇮Slovenia deaom

    Changed the primary key back to id and added the validation also for index save. When you for example have no server selected or some other server that is not meili and the field is changed to id, then you decide to change the server to meili ti should throw an error, and not allow to save it. The validation for index field was added from the 🐛 Remove the possibility to add field with machine name id Fixed . Ready for testing.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    109 pass
  • 🇸🇮Slovenia bcizej

    Added presave hook to throw an exception if id field is defined on an index and the index uses Meilisearch. Also reworded the error messages to be more user friendly.

    Tests revealed more issues connected that I had to fix:

    1. Special fields were not added to filterable and sortable attributes
    2. Setting backend configuration programatically would not update url or master key in the meilisearch api service
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    109 pass
  • 🇸🇮Slovenia deaom

    The one issue I also encountered is, if you already have a meili server set up with an index that already has a field with the "id" name and you want to reindex it, it will reindex it, which is not correct. So added an event subscriber that checks if there is a field with the id set up, to prevent the indexing of that index.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    109 pass
    • bcizej committed eafdd5ca on 1.x
      Issue #3385708 by bcizej, DeaOm, tgoeg: Reindexing problems after...
  • Status changed to Fixed over 1 year ago
  • 🇸🇮Slovenia bcizej

    I think we got everything covered. Thanks everyone.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024