Do not delete items when loading fails

Created on 8 May 2024, about 2 months ago
Updated 19 May 2024, about 1 month ago

Problem/Motivation

We're using search api together with WissKI and a triple store as a data backend. The data backend might fail to answer requests from time to time e.g. due to restarting, overload etc. This however is a normal procedure and afterwards it will be fully available again. If indexing takes place in such a timeslot typically the entities that failed to load get removed from the tracking queue which leads us to have to retrack all the time. With many entities with a lot of data this takes a lot of time and is not very precise, so we end up having missing entities all the time for the index.

Steps to reproduce

Install WissKI, have a big database and do indexing :)

Proposed resolution

Make the deletion of entities in case of failed indexing optional by a setting. We don't want to change the default behaviour, just make it switchable for us.

Remaining tasks

Nothing - Patch attached :)

✨ Feature request
Status

Needs review

Version

1.0

Component

General code

Created by

🇩🇪Germany Knurg

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @Knurg
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.1 & MySQL 5.7
    last update about 2 months ago
    548 pass
  • 🇦🇹Austria drunken monkey Vienna, Austria

    drunken monkey → made their first commit to this issue’s fork.

  • Status changed to Needs work about 2 months ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for reporting this issue and already providing a very nice patch!
    Since we now switched to using MRs instead of patches, I created an MR with your patch. (I now see that we still had the old advice to not use MRs in the new issue notice, so my fault.)

    Some remarks on the patch itself:

    1. First off, I prefer to avoid negations in setting names wherever possible, so I changed the setting name to delete_on_fail, defaulting to TRUE.
    2. Second, your description doesn’t square completely with what your patch does. Are you sure it works correctly on your site? From your description, it would seem like the problem is not failed indexing, but failed loading of items. In this case, yes, Search API thinks the item has been deleted without us noticing it, so also deletes it from the index on the search server. I changed the patch so that the setting now controls that behavior, not the other one during indexing. (As far as I can see, the code you changed only affects items deliberately removed from indexing, e.g., by the “Entity status” processor.)
      Please tell me if that was the solution you were looking for, or whether indeed your old patch did what you wanted it to. In the latter case, I think I’d need a more detailed description of the problem and what exactly happens.
    3. Architecturally, the fact that a dysfunctional data store cannot be distinguished from missing entities is unfortunate. I’d really like the datasource to throw an exception in this case, instead of just returning an empty array. Unfortunately, this is currently not even allowed by the contract of \Drupal\search_api\Datasource\DatasourceInterface::loadMultiple(), so really just my own fault. Then I guess this new setting is probably really the best solution (even though it might mean that some actually deleted items stay in the index on your site).
    4. This is a small enough change to not need much approval, I think, so I’ll just merge this after hearing back from you and waiting for a week or so for any additional feedback. However, I’d guess this will not be a massively popular option. Therefore, and because it’s really only something for advanced users, I think I’m removing it from the index config form. You, and any others in need of it, will have to manually enable it via a config export/import. I hope that is acceptable, but I really don’t want to push yet more options on already overwhelmed new users.
    5. Finally, this will need automated test coverage before it can be merged. Would you be able to provide that? There is already a test for the affected functionality in \Drupal\Tests\search_api\Kernel\Index\IndexLoadItemsTest::testMissingItems(), we’d just need to add a copy of that method that verifies that the deletions do not happen when the new setting is disabled. (Doing it via a @dataProvider would also work, not sure what makes more sense.)
  • 🇩🇪Germany Knurg

    Thanks for working on that. Unfortunately you were right, I forgot one more spot where the tracking deletion was called.

    However I seem not to be able to push to the merge request and I don't know how to push on it otherwise :/

    I thought the negation would be more convenient because it does not touch the existing behaviour and I don't have to change any defaults in the database etc. I would really be more happy with an interface for the users to simply check the option. Our typical target group is art historians who hardly are able to install Drupal at all and can't really handle the installation itself - so having no option here does simply give our project team even more load in infrastructural parts, while it could be avoided :/

    I've attached a patch for IndexLoadItemsTest.php.

  • Status changed to Needs review about 1 month ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for the testing code!
    It’s of course ridiculous that you cannot work on that MR now. Not sure how this is supposed to work … In any case, I couldn’t find an option to grant you access, so I just applied your IndexLoadItemsTest.patch for you, and then committed my style changes separately.

    Regarding your other change, to Index::trackItemsDeleted(), however, I don’t think this makes sense. This would not just keep items that cannot be loaded from being removed from the tracker, but also those where, e.g., an entity hook informed us that the entity has been deleted. This is definitely not a place where this option should have an effect – in my opinion, it’s correct and enough where it is. Does it work as intended now for you, even without your change?

    I would really be more happy with an interface for the users to simply check the option. Our typical target group is art historians who hardly are able to install Drupal at all and can't really handle the installation itself - so having no option here does simply give our project team even more load in infrastructural parts, while it could be avoided :/

    I don’t know about your business model, but if you are supporting multiple sites that are primarily maintained by others, I assume you have a distribution or at least some custom helper module(s)? Then for the moment, maybe just add the UI for the option there (a simple form alter should suffice), or just permanently activate this option for all search indexes (e.g., via hook_ENTITY_TYPE_presave()) if it makes sense.
    For the general “population” of Search API users I’m pretty certain that this is a niche feature, so really don’t want to overload the UI with that.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    PS: Oh, regarding push access to the issue fork: Did you try the “Get push access” button at the bottom of the issue summary?

Production build 0.69.0 2024