Reindexing- Solr service instance crashing traced to deleteByQuery

Created on 23 June 2023, about 1 year ago
Updated 26 June 2023, about 1 year ago

Respect the submission guidelines above! Drupal.org issue forks cause additional work for the project maintainer!

Setup

  • Solr version: 7
  • Drupal Core version: 9.5.8
  • Search API version: 8.x-1.27
  • Search API Solr version: 4.2.8
  • Configured Solr Connector: Acquia Search 3.0.7

Reindexing- Solr service instance crashing traced to deleteByQuery

I team - we've got a customer using the above configuration whith about 40,000 items in the index that is running into problems when deleting all items from an index and then re-indexing. And long story short, we've found that in the re-indexing, reindexing in batches is going ok, but when a problem is encountered, they continue to reindex in batches and re-index single items, which in results in a sort of a flood of requests to the Solr Search, and with a large enough number of items, can cause enough traffic to cause the hosted Solr Search to fail.

Our team has traced the issue down to this issue which suggest avoiding using deleteByQuery() https://www.zisistach.org/posts/solr-performance/#avoid-using-deletebyquery

Since their production sites, are working fairly well, and they are usually running into this issue while doing development work where they have totally delete everything indexed in solr and re-index everything, it's suggesting to me that there may be a couple of different paths forward:

  • optimize or find alternatives to the current implementation deleteByQuery()
  • add a tool - and forgive me if it's already there - to the admin interface to make it easier on the solr servers to delete and re/index

If this is already solved, and there's a better set of tools to recommend to the customer for re-indexing, happy to follow up on that. I looked for similar tickets, but didn't find the answer to this particular question.

thank you for any help!

πŸ’¬ Support request
Status

Active

Version

4.2

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States erin.rasmussen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @erin.rasmussen
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    when a problem is encountered, they continue to reindex in batches and re-index single items

    .

    That's the intended behaviour. If there's an erroneous item somewhere in the batch, indexing will never succeed. But you also won't find the reason. So the indexer reduced the number of items to 1 and steps forward until the erroneous item is reached.
    Now can get useful error messages and fix the erroneous item or your custom code.

    Our team has traced the issue down to this issue which suggest avoiding using deleteByQuery() https://www.zisistach.org/posts/solr-performance/#avoid-using-deletebyquery

    I'm aware of this "issue". We use DeleteById() but we can't do so for nested documents.
    Unfortunately Search API itself doesn't know anything about nested documents.
    But the DelteByQuery() part recently changed and you should upgrade. And I would be happy if your team would contribute a patch for the "search for IDs and DeleteById()" approach.

    In general I would avoid such deletes at all, but search_api_solr has to implement search_api's backend interface.
    I would prefer something like https://www.drupal.org/project/search_api_solr/issues/3150654 ✨ Swap cores when reindexing Active but this isn't supported by most Solr Hosting providers.

    Atomic Updates work very well with Solarium, but again Search API doesn't know that concept.

Production build 0.69.0 2024