Optimize for sites with large number of entities.

Created on 14 November 2024, 5 months ago

Problem/Motivation

I opened up another ticket to support our custom entities, but I don't think this module will work for us because of the sheer number of entities that we have. There are something like 150k of our entity that we need to prune revisions from, so it's just totally infeasible to loop over every one of them looking for revisions during cron. We have a 100% chance of hitting a timeout.

Steps to reproduce

Create a large volume of entities with revisions and try to clean them up.

Proposed resolution

A more optimized way to handle the cleanup would be to create a queue for all that need to be cleaned, instead of a queue of revisions to be deleted.

At the beginning you could do a one-time cleanup of all entities by adding them all to the queue. After that, instead of adding revisions to the queue during cron, you could use something like hook_entity_update() (or another hook if there's something for revisions) to see when a revision is created, and then add the entity into the queue for its revisions to be processed.

Remaining tasks

Update the queue architecture.

User interface changes

None

API changes

None

Data model changes

Change the queue to process entities instead of arrays of revisions.

✨ Feature request
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States mrweiner

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @mrweiner
  • πŸ‡ΊπŸ‡ΈUnited States mrweiner

    I'm trying to decide whether we need to roll our own solution to this, or if we can contribute a patch here. Will post an update if I have one.

  • πŸ‡ΊπŸ‡ΈUnited States mrweiner

    It turns out that revisions were on for our entity, but we never actually used them. As such we have no revisions to actually clean up and don't need this module. That said, I still think this would be a good approach to optimizing the performance if anybody is interested in tackling that.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Just a drive-by note while reviewing similar modules.

    node_revision_delete 2.x kind of works like that, it puts entities in the queue, not revisions. And there are multiple plugins to then decide which revisions to delete. And it only puts entities in the queue as they are saved, not on cron. So it scales quite nicely. One downside is that age-based logic only works if an entity is saved again. so if you keep 50 revisions for at least 2 months, if there are 100 revisions within a short time and then the entity isn't updated anymore it is not cleaned up ever (except if you do a full queue processing through the UI, but has the same limitations currently that it loads all ids at once, there should be no need for that).

Production build 0.71.5 2024