Add the ability to process the queue without triggering a rebuild of the queue, even if the queue is empty

Created on 4 April 2023, over 1 year ago
Updated 19 April 2023, over 1 year ago

Problem/Motivation

When processing a large number of items (in the order of a few thousands or more) using the drush ssg command, the process can appear non-deterministic. Specifically, it's uncertain whether processing or rebuilding the queue will happen next. This unpredictability is a significant problem because users do not know what the command will do until they run it.

In cases where processing takes an extended period of time, the current behavior becomes an even more significant issue. With the current setup, it is not possible to "run the command once per night" and be sure that it'll run as expected.

Steps to reproduce

Proposed resolution

  1. Create two new commands: one for rebuilding the queue and another for processing items in the queue. This approach completely separates the two actions, and users can choose to run either command as needed.
  2. Update the existing commands to maintain backward compatibility.

To address this issue, we ( NyMedia ) are implementing the following solutions and plan to submit a patch/merge-request soon.
We are writing here to have some feedback on the idea and welcome any suggestions or criticisms.

Remaining tasks

User interface changes

API changes

Data model changes

💬 Support request
Status

Active

Version

4.0

Component

Code

Created by

🇵🇱Poland Yuraul

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Yuraul
  • 🇩🇪Germany gbyte Berlin

    There are and have been for a long time two drush commands:
    simple-sitemap:rebuild-queue or ssr which only ever rebuilds the queue
    and
    simple-sitemap:generate or ssg which rebuilds the queue if it is empty and generates the results for the duration of the Sitemap generation max duration setting.

    Specifically, it's uncertain whether processing or rebuilding the queue will happen next.

    When using ssg, processing happens each time, with a rebuild taken place if the queue is empty. When using ssr, only rebuilding takes place. That means, when using both commands, you can be certain both events will take place (assuming a correct configuration). You normally wouldn't want to do that though, as rebuilding the queue destroys the current queue, so with a huge queue and tight time limits the generation of all elements might never finish.

    In cases where processing takes an extended period of time, the current behavior becomes an even more significant issue. With the current setup, it is not possible to "run the command once per night" and be sure that it'll run as expected.

    I'm afraid I don't follow. Rebuilding the queue should only be required after the generation of all elements, as generating items is not concurrent (each sitemap gets refreshed when all of its elements are regenerated and sitemaps are processed one after another). (There is a way of rebuilding the queue of specific sitemaps and generating specific sitemaps via the module's API, but not via the UI nor via the direct drush commands.)
    If you want to deterministically regenerate all sitemaps each day, increase the Sitemap generation max duration setting to something very high (30 seconds?), and run ssg once a day. Nothing more to it.

    Other than that there is a vast amount of API functions available to help you build a drush command in a custom module. Let me know if you need any pointers.

  • 🇳🇴Norway esolitos Trondheim

    Hi, thanks for taking the time with the feedback.

    When using ssg, processing happens each time, with a rebuild taken place if the queue is empty.

    The highlighted part is the core problem we wanted to highlight, this command cannot simply run every few minutes, because at some point the queue will be empty and it will start the process all over again.

    Regarding your suggestion:

    If you want to deterministically regenerate all sitemaps each day, increase the Sitemap generation max duration setting to something very high (30 seconds?), and run ssg once a day.

    Our generation takes well above 30 seconds as we are dealing with millions of content entities. ^_^

    Other than that there is a vast amount of API functions available to help you build a drush command in a custom module. Let me know if you need any pointers.

    We are aware that we can build our custom implementation, but we wanted to contribute back to this module as we use it quite extensively.
    Our plan would be to implement an additional command (for example simple-sitemap:process) which only processes the queue and if it is empty simply exits, much like the drush queue:run command.
    And of course we plan to maintain the current functionality of simple-sitemap:generate.

    The main point we wanted to make here is that we are implementing this already for our own usage, however we think this is a valuable addition to this already great module, we wanted to know if this is is something which would be interested reviewing and adding.

  • 🇩🇪Germany gbyte Berlin

    No worries, please excuse me asking further questions, but I still need more clarity on this issue. :)

    The highlighted part is the core problem we wanted to highlight, this command cannot simply run every few minutes, because at some point the queue will be empty and it will start the process all over again.

    "Starting the process all over again" after it has finished is its intended purpose. Please explain why this seems to bother you - is it because of it taking too many resources? It's not like the sitemaps are offline during their regeneration.

    The problem I explained in my previous comment is: Say you would create the command simple-sitemap:process and used it in combination with simple-sitemap:rebuild-queue to deterministically regenerate the sitemaps once a day. You would have to check if all sitemaps have been generated before simple-sitemap:rebuild-queue, otherwise the generation of the sitemaps may never finish (as the queue gets rebuilt each day). So basically you would also need a new rebuild-queue command as well.

  • 🇵🇱Poland Yuraul

    "Starting the process all over again" after it has finished is its intended purpose. Please explain why this seems to bother you - is it because of it taking too many resources?

    Sorry, I missed to explain this in motivation part. We need to have the ability to run a process with a time limit as many times as necessary to complete the current generation according to a schedule.

    So basically you would also need a new rebuild-queue command as well.

    Probably yes, good point, thank you.

    Thank you for your feedback. I also want to note that the module is working just fine in general usage. The issues we've faced are only related to the huge amount of content, and we're working on solving them and want to share the solution.

  • 🇵🇱Poland Yuraul

    I'd like to add the example to make more clear about the determinism.
    Let's say I have sitemaps: index, monthly, incremental.
    I want to regenerate incremental daily, but
    I want to generate the monthly only once a month. The reasons for this are:

    1. Generation takes a very long time
    2. The content doesn't change very often, so it is pointless
    3. Extra resource usage
    4. Makes the generation of the incremental much slower

    I can run simple-sitemap:rebuild-queue --variants=incremental, but when I run simple-sitemap:generate, once it has processed the queued items, next time it will also start processing the index and monthly.
    Therefore, if I attempt to run simple-sitemap:generate separately, I don't know what it will do next:

    1. Will it continue regenerating the incremental sitemap that was started earlier?
    2. Will it requeue all sitemaps and start the generation from scratch?
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    32 pass
  • @yuraul opened merge request.
Production build 0.71.5 2024