Add setting for reindexing every revision on entity update

Created on 8 February 2024, 10 months ago
Updated 12 March 2024, 9 months ago

Problem/Motivation

Currently when an entity is updated, our search_api_revisions_entity_presave hook gathers a list of ALL entity revisions and adds them to a property which is then read in search_api_revisions_entity_update and marks all previous revisions to be updated.

I'm not sure why exactly this is done, but it doesn't seem right. Old revisions shouldn't be re-indexed based on new revisions. This could easily lead to performance issues for entities that contain many edits.

I propose we remove the pre-save hook entirely (and the associated queue since it's only used there).

Once the queue is gone we can also remove the config and form as well!

Steps to reproduce

Proposed resolution

Either:
Remove search_api_revisions_entity_presave and the associated queue
Put the functionality behind a setting, although I still can't think of a good use case for it.

Remaining tasks

Remove config
Remove config form

User interface changes

API changes

Data model changes

✨ Feature request
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡¦πŸ‡ΊAustralia acbramley

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @acbramley
  • πŸ‡¦πŸ‡ΊAustralia acbramley
  • πŸ‡¦πŸ‡ΊAustralia acbramley

    Turns out that the fields reindexing was due to a separate unrelated bug with the entity type I was testing.

  • πŸ‡¦πŸ‡ΊAustralia acbramley
  • πŸ‡¨πŸ‡ΏCzech Republic mkolar

    Hi. We have to test this. Im not 100% sure now but I think it’s due flags like is default revision and is last revision. Especially in case you have published revision which is not last (published article with new unpublished drafts for example). We have few millions of revisions with no performance problem (module needs to reindex in the queue).

  • Status changed to Needs review 10 months ago
  • πŸ‡¨πŸ‡ΏCzech Republic mkolar
  • Status changed to Needs work 10 months ago
  • πŸ‡¦πŸ‡ΊAustralia acbramley

    @mkolar ok that does make sense.

    I don't think we need to do this logic in a presave though? We can do it directly in the entity_update hook and do away with the magic search_api_revisions_update_ids property. In entity_update we can just select all revisions that aren't the current revision. We can also add a setting for this logic alongside the queue setting.

  • πŸ‡¨πŸ‡ΏCzech Republic mkolar

    TBH not rly sure why we used search_api_revisions_update_ids and not doing it in entity_udpate hook (since we made that hook too) but i bet there was some good reason. What might be one of the reasons is that like this, its alterable by another module (but we are not doing this). But since its not improving anything why to change this, whats the reason.

  • πŸ‡¨πŸ‡ΏCzech Republic mkolar

    Why you would like to add setting not to index all revisions since module itself doing those flags which needs to be updated? Why you need this module since you only care about default revision then? We can of course add this settings thats easy but not sure how to deactivate those flags if this is turned off. I dont have problem with the settings.

  • πŸ‡¦πŸ‡ΊAustralia acbramley

    But since its not improving anything why to change this, whats the reason.

    Because using magic properties is not best practice. There's no reason to keep this logic in a presave hook when we can combine it with the update hook and make it much cleaner.

    Why you would like to add setting not to index all revisions since module itself doing those flags which needs to be updated

    In our use case, we are not indexing any of these flags. We simply need to index basic information across many different entity types. This information does not change once a revision is saved, therefore reindexing every revision is redundant.

    e can of course add this settings thats easy but not sure how to deactivate those flags if this is turned off, i dont want setting that causes wrong flags

    It would just be another setting like the queue setting. We can have it enabled by default to retain the current behaviour, but allow users (like me) to disable it :)

  • πŸ‡¨πŸ‡ΏCzech Republic mkolar

    If you want, you can proceed with the settings thing + moving it from magic property to entity save completely. It looks better to me now to move it, its already pretty old so cant remember why its in presave. When you have a patch, we will try the patch on our project if its ok. It might take a while. Thank you!

  • Status changed to Needs review 10 months ago
  • πŸ‡¦πŸ‡ΊAustralia acbramley

    I've pushed a fair chunk of work, what I've done so far:

    1. Moved all logic from .module files into a new PHP class for easier reuse + dependency injection
    2. Heavily refactored the above functions to be more reusable
    3. Moved presave hook logic into entity_update using an entity query
    4. Added new index_previous_revisions setting which is used in hook_entity_update

    You can see that the full test suite still passes that I committed in πŸ“Œ Write tests Fixed - and we can remove the line that emptied search_api_revisions_update_ids.

    All that's left is additional test coverage for when the new setting is off.

  • Pipeline finished with Skipped
    9 months ago
    #104582
  • Status changed to Fixed 9 months ago
  • πŸ‡¦πŸ‡ΊAustralia acbramley

    2 weeks with no feedback, I'm going to merge this and cut a new alpha. Feel free to open a new issue for feedback.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024