Batch fix references command

Created on 16 March 2023, over 1 year ago
Updated 17 January 2024, 10 months ago

Problem/Motivation

The drush radioactivity:fix-references command needs to be batched. It can easily run out of memory when there are hundreds or thousands of missing references. Not that uncommon when installing to established sites or after migrations.

Proposed resolution

Use the BatchBuilder to process entities in a customizable batch size.

πŸ› Bug report
Status

Needs review

Version

4.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States robphillips

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @robphillips
  • @robphillips opened merge request.
  • Status changed to Needs review over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States robphillips
  • Issue was unassigned.
  • πŸ‡ΊπŸ‡ΈUnited States robphillips
  • First commit to issue fork.
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.0.5 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    75 pass
  • πŸ‡³πŸ‡±Netherlands Tr4nzNRG

    Currently having the same issue where it needs to update about ~12000 nodes. The process seems to take about 20-50min (with no feedback). I would love to see this as a batch processor so we can run this process in the background on a production server without possibly causing an interruption of the service or slowing the website to a halt.

  • πŸ‡³πŸ‡±Netherlands thomasdegraaff

    Looking at the code it seems that the dependency injection for the entity type manager service is removed, and the existing dependency injection for the logger service and the radioactivity reference updater service is not used anymore.

    Dependency injection is the preferred method for accessing and using services in Drupal 8 and should be used whenever possible. Is there a reason not to use dependency injection in this case?

  • πŸ‡ΊπŸ‡ΈUnited States robphillips

    DI is preferred when not using static methods. Batch API requires operation callbacks to be static methods or functions.

  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia

    @Tr4nzNRG
    @thomasdegraaff

    Please apply the patch and test it, then report your results here. The patch looks good to me, but I don't have a site with a lot of radioactivity nodes I can test it on. I would prefer that at least one person tests this patch on real data before I commit it.

  • πŸ‡³πŸ‡±Netherlands Tr4nzNRG

    I tested this in my dev environment and noticed it seems to process around 50 nodes / min. So it seems that the batch process works?

    For the user it might seem that 'nothing' happens as the command line doesn't give any feedback of the running process. I only saw it myself by looking into the database and saw an ongoing increase of around 50 rows / min for the radioactivity table.

    For my website I need to process around ~22.000 nodes in total. I didn't notice any slowness on the dev environment while this process took place. So it takes around 8 hours before all the nodes are processed.

    I can also give feedback if it ever get's applied on a production environment. But first I need to test and apply this patch: https://www.drupal.org/project/drupal/issues/2329253#comment-14830297 πŸ“Œ Allow ChangedItem to skip updating the entity's "changed" timestamp when synchronizing Fixed

    To solve another issue where the 'changed' date get's updated. This is unwanted but already resolved for the radioactivity module. Just not in Drupal Core.: https://www.drupal.org/project/radioactivity/issues/3348337 πŸ› Set syncing when updating reference fields Needs review

  • πŸ‡³πŸ‡±Netherlands Tr4nzNRG

    I tested this on real data in my DDEV environment it seems to works as intended. For now this could be merged with the dev? @thomasdegraaff

    However I think it still needs a minor improvement. I noticed that the batch process seems to slow down after 2000-5000 nodes. The reason why I don't know yet as the website I'm using has some complexity with other modules and content (direct indexing with Search API/Solr, cache invalidation?).

    When I stop the process and restart the command it's back at it's original speed. So maybe this could be changed in the future? For now this is already a good improvement and allows us to use this module for production. So thanks for all the effort so far ;)

  • πŸ‡³πŸ‡±Netherlands Tr4nzNRG

    Short update: On our TEST environment we had an incident that the webserver ran out of memory. However we aren't sure if this is due this fix or that we have to many other running tasks(Search API indexing, Node Revision Delete). Also our TEST server has less memory than our PROD server.

  • πŸ‡³πŸ‡±Netherlands Tr4nzNRG

    We used this batch process and noticed that it slows down after processing ~1000/2000 nodes. Manually stopping this process and restarting 'fix' the issue and prevents a out-of-memory (depending on availability from webserver).

    Another note and maybe good to add to the documentation when processing large amounts of nodes on high traffic website is that this process in combination with other modules could cause a cascade. For example when used with Node Revision Delete or executing this command while a re-indexing of Search API could cause a out-of-memory depending on the availability of memory from the webserver.

    So it's wise to execute this batch/command when other processes aren't running and monitor the process. Like after a build process (release).

    In short...

    It might be good to adjust this solution so it 'restarts' after ~1000 nodes or find a solution for the OOM issue. For us it was the ?only? methode right now to handle a large amount of nodes on a medium/high traffic website.

Production build 0.71.5 2024