Batch to add all nodes to queue loads all nodes at once.

Created on 21 February 2025, 3 months ago

Problem/Motivation

The batch operation queries all nodes and then creates an operation for every single ID. That creates a massive batch array that has be stored and loaded on every batch step, it might also time out or run into memory issues.

Steps to reproduce

A better approach would to use the batch sandbox, calculate the count, and then do a query to get N ids, create queue items for them and then increment the index. Possibly something like sort by nid, store the largest nid and then proceed with that on the next query.

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Active

Version

2.0

Component

Code

Created by

🇨🇭Switzerland berdir Switzerland

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @berdir
  • 🇨🇦Canada adriancid Montreal, Canada

    Hi @berdir, thanks for this issue, I'm not having much time these days and I see you're very active in the issue queue, do you want to become a maintainer?

  • 🇨🇭Switzerland berdir Switzerland

    I was actually considering that recently, but then saw that you pushed a new release. I did a bit of a review and testing around updating 1.x to 2.x and just wanted to write down my findings.

    Feel free to add me, so there's another person around if a release is necessary or so, but I can't really promise much beyond that. I might or might not work on this issue (and others), depending on how much it's a problem for us, haven't yet tested where the limits are.

  • First commit to issue fork.
  • Pipeline finished with Success
    2 months ago
    Total: 154s
    #441067
  • 🇫🇮Finland onnia

    Hi,
    My merge request does two things: The nodeExistsInQueue check is replaced with array of all node ids and a isset check, this method cuts the run time from tens of minutes to seconds. Second thing is the batch creation uses chunks of node ids that are processed. The chuck size can be set via the queue_chunk_size config. I also looked into updating drush queue adding with 520000 nodes, the drush (time drush node-revision-delete:queue) adding takes 1min. I still have to commit my fix for the drush command.

  • Pipeline finished with Success
    2 months ago
    Total: 200s
    #441636
  • Pipeline finished with Success
    2 months ago
    Total: 152s
    #441767
  • 🇨🇭Switzerland berdir Switzerland

    What's the memory usage with that amount of content? drush -vvv should report that.

    My idea was something like https://git.drupalcode.org/project/tmgmt/-/blob/8.x-1.x/sources/content/..., with a sandbox and a query that progresses through it. but that would of course be a lot slower.

    It's a bit awkward that the batch methods are on an interface, that technically makes this an API break, my method would too. Seems like something like that should be an internal implementation, but tricky to step back from that now, the module is stable.

  • 🇫🇮Finland onnia

    I did some memory debugging with this helper: https://blog.riff.org/2016_08_04_how_to_display_time_and_memory_use_for_drush_commands
    When the 587k node ids are already in the queue and the $nids_in_queue is at largest

    Memory:
      Initial: malloc = 21.60M           real = 22.25M
      Final:   malloc = 22.58M (+ 0.99M) real = 49.25M (+27.00M)
      Peak:    malloc = 80.03M (+58.43M) real = 96.25M (+74.00M)
    

    Yes, the memory usage peaks. About the api changes: I did not consider the breaking changes when editing the BatchInterface.

  • 🇫🇮Finland onnia

    If the edits to the src/NodeRevisionDeleteBatchInterface.php are an issue, then the MR could be altered to create a custom method for the batch queue which processes the $nids_chunk. This new queueChunk($nids_chunk) method could be used when the count of processed nids_to_add gets larger then eg 1000? The new method could be added here -> https://git.drupalcode.org/project/node_revision_delete/-/merge_requests...

  • Pipeline finished with Success
    about 2 months ago
    Total: 212s
    #445700
Production build 0.71.5 2024