- Issue created by @djdevin
The orphan purger runs one entity per queue run. During a mass cleanup of parent entities this makes the cleanup process extremely slow.
I encountered a situation where the orphan purge queue was upwards of 10M after deleting millions of entities. This is as expected but running the queue normally would take 115 days to process because each run takes about 1 second (load the entity, evaluate, delete, count items in queue).
Even running 8 parallel `drush qp` jobs it's still looking like it will take months to complete (understandably, the queue also keeps increasing since other paragraphs are being orphaned during deletion).
The evaluation is being done per-item in processItem(), which slows everything down. It is also deleting revisions and not the entire entity, which in this case isn't needed since the parent (of all the revisions) is gone.
1. Create 10M entities with paragraphs that have paragraphs on them, then delete the parent entities.
2. Run the queue.
3. See how long it takes :)
I've been trying to figure out the best way to do this and this is the solution I currently have that seems safe. My first try, I just took the queue and rechunked it, deleted all the paragraphs and all my content went poof since I wasn't checking to see if the paragraph was still in use.
Now I used similar logic to isUsed(), but I ran it before the queue was processed, so that anything in the queue was certainly ready for deletion. But isUsed() was also extremely slow to run on all paragraphs, so I used a query instead:
1. Query for all unique parent_type values on paragraphs_item_field_data
2. Run separate queries, each filtered by parent_type, left joining the base table to parent_type and parent_id
3. Filter on the base table's ID being NULL (ensure paragraph's parent definitely does not exist)
4. Chunk all the paragraph IDs by 100 and put in a new queue
5. Run queue, delete paragraphs in chunks
This brought the runtime down from a few months to < 6 hours.
Maybe there is a better solution instead of a direct DB query, but I think the idea is still valid - move the checks before the queue, then load/delete all the entities in a batch. Not using loadMultiple() or deleteMultiple() is a major slowdown in large quantities.
Active
1.0
Code