- Issue created by @robphillips
- Status changed to Needs review
almost 2 years ago 2:43am 16 March 2023 - Issue was unassigned.
- First commit to issue fork.
- last update
over 1 year ago 75 pass - π³π±Netherlands Tr4nzNRG
Currently having the same issue where it needs to update about ~12000 nodes. The process seems to take about 20-50min (with no feedback). I would love to see this as a batch processor so we can run this process in the background on a production server without possibly causing an interruption of the service or slowing the website to a halt.
- π³π±Netherlands thomasdegraaff
Looking at the code it seems that the dependency injection for the entity type manager service is removed, and the existing dependency injection for the logger service and the radioactivity reference updater service is not used anymore.
Dependency injection is the preferred method for accessing and using services in Drupal 8 and should be used whenever possible. Is there a reason not to use dependency injection in this case?
- πΊπΈUnited States robphillips
DI is preferred when not using static methods. Batch API requires operation callbacks to be static methods or functions.
- πΊπΈUnited States tr Cascadia
@Tr4nzNRG
@thomasdegraaffPlease apply the patch and test it, then report your results here. The patch looks good to me, but I don't have a site with a lot of radioactivity nodes I can test it on. I would prefer that at least one person tests this patch on real data before I commit it.
- π³π±Netherlands Tr4nzNRG
I tested this in my dev environment and noticed it seems to process around 50 nodes / min. So it seems that the batch process works?
For the user it might seem that 'nothing' happens as the command line doesn't give any feedback of the running process. I only saw it myself by looking into the database and saw an ongoing increase of around 50 rows / min for the radioactivity table.
For my website I need to process around ~22.000 nodes in total. I didn't notice any slowness on the dev environment while this process took place. So it takes around 8 hours before all the nodes are processed.
I can also give feedback if it ever get's applied on a production environment. But first I need to test and apply this patch: https://www.drupal.org/project/drupal/issues/2329253#comment-14830297 π Allow ChangedItem to skip updating the entity's "changed" timestamp when synchronizing Fixed
To solve another issue where the 'changed' date get's updated. This is unwanted but already resolved for the radioactivity module. Just not in Drupal Core.: https://www.drupal.org/project/radioactivity/issues/3348337 π Set syncing when updating reference fields Needs review
- π³π±Netherlands Tr4nzNRG
I tested this on real data in my DDEV environment it seems to works as intended. For now this could be merged with the dev? @thomasdegraaff
However I think it still needs a minor improvement. I noticed that the batch process seems to slow down after 2000-5000 nodes. The reason why I don't know yet as the website I'm using has some complexity with other modules and content (direct indexing with Search API/Solr, cache invalidation?).
When I stop the process and restart the command it's back at it's original speed. So maybe this could be changed in the future? For now this is already a good improvement and allows us to use this module for production. So thanks for all the effort so far ;)
- π³π±Netherlands Tr4nzNRG
Short update: On our TEST environment we had an incident that the webserver ran out of memory. However we aren't sure if this is due this fix or that we have to many other running tasks(Search API indexing, Node Revision Delete). Also our TEST server has less memory than our PROD server.
- π³π±Netherlands Tr4nzNRG
We used this batch process and noticed that it slows down after processing ~1000/2000 nodes. Manually stopping this process and restarting 'fix' the issue and prevents a out-of-memory (depending on availability from webserver).
Another note and maybe good to add to the documentation when processing large amounts of nodes on high traffic website is that this process in combination with other modules could cause a cascade. For example when used with Node Revision Delete or executing this command while a re-indexing of Search API could cause a out-of-memory depending on the availability of memory from the webserver.
So it's wise to execute this batch/command when other processes aren't running and monitor the process. Like after a build process (release).
In short...
It might be good to adjust this solution so it 'restarts' after ~1000 nodes or find a solution for the OOM issue. For us it was the ?only? methode right now to handle a large amount of nodes on a medium/high traffic website.