The "Clear all indexed data" action with "site_hash: false" doesn't delete all stored data from the index

Created on 9 June 2023, almost 2 years ago

Setup

  • Solr version: 8.11.2
  • Drupal Core version: 9.5.7
  • Search API version: 8.x-1.28
  • Search API Solr version: 4.2.10
  • Configured Solr Connector: Standard

Issue

The Search API Solr Server configuration has a flag "site_hash" (Retrieve results for this site only), and with this flag disabled, it can retrieve data from other sites, which were added to the Solr index for some reason.

As a result, we can get errors like this:

The datasource with ID "3szabr--nodes:node" could not be retrieved for index "Nodes".

That's right and clear.

Such data can appear in the Solr core if we imported and reindexed data from different sites, reusing the same configuration, but different state storage ("key_value" table).

But the problem is that we don't have any way to clean up this data from the Solr core!

The "Clear all indexed data" action with "site_hash: false" doesn't delete all stored data from the index, it deletes only data with the current site hash.

As a result, after clearing all indexed data, this error will persist. And users should find a way to manually clear data in Solr, via direct queries like this:
https://stackoverflow.com/questions/23228727/deleting-solr-documents-fro...

To resolve this issue, will be good to implement a separate action "Clear all indexed data, including data from all sites".

🐛 Bug report
Status

Active

Version

4.2

Component

Code

Created by

🇦🇲Armenia murz Yerevan, Armenia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @murz
  • 🇦🇲Armenia murz Yerevan, Armenia
  • 🇦🇲Armenia murz Yerevan, Armenia
  • 🇦🇲Armenia murz Yerevan, Armenia
  • 🇩🇪Germany mkalkbrenner 🇩🇪

    The action already exists. But due to the fact that it is a dangerous action, it is part of the search_api_solr_admin module!
    And it is not part of the index but of the server: "Delete all indexed data on this server"

  • Status changed to Fixed almost 2 years ago
  • 🇦🇲Armenia murz Yerevan, Armenia

    Thank you for pointing this out, that's good that we already have it:

    drush search-api-solr:devel-delete-all [server_id]
    

    But It's part of the search_api_solr_devel module, not the search_api_solr_admin, and "search_api_solr_devel" is too technical to keep it enabled just for the ability to fully clean the index, it even has a dependency on "drupal/devel" module!

    So, could you please move this feature to the search_api_solr_admin module?

    And it still can delete only all indexed data from the server, without the ability to "fully" clean only the specific index, so would be good to extend it.

  • 🇩🇪Germany mkalkbrenner 🇩🇪

    Sorry, you're right it is part of search_api_solr_devel.

    Nevertheless I considered the feature too dangerous to turn it into a regular feature. You can easily corrupt your multisite setup or a foreign index (Solr Doucument datasource). That's why I put it into the devel context.

    But you can convert this issue into a feature request and provide the corresponding patch. But we need to add explicit documentation and additional checks about the nature of the Search API server (read-only, multisite, etc.).

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed about 19 hours ago
  • 🇩🇰Denmark ressa Copenhagen

    Thanks @kalkbrenner for maintaining Solr in Drupal. @murz: Your suggestion about adding support for indexes as well is great. I added a documentation page a while ago, and was reminded that the command works on Solr server, not index, and added a note about this.

    https://www.drupal.org/docs/8/modules/search-api-solr/search-api-solr-ho...

Production build 0.71.5 2024