your queue exceeded 100 000 items ! Purge shut down

Created on 30 April 2020, almost 5 years ago
Updated 9 May 2023, almost 2 years ago

Hi,

I have the latest Drupal 8, with Purge module & Cloudflare module.

Suddenly I get this error. I tried everything I could find online to fix this, but with no result :

Purge: Queue size
157998
Your queue exceeded 100 000 items! This volume is extremely high and not sustainable at all, so Purge has shut down cache invalidation to prevent your servers from actually crashing. This can happen when no processors are clearing your queue, or when queueing outpaces processing. Please first solve the structural nature of the issue by adding processing power or reducing your queue loads. Empty the queue to unblock your system.

How can I fix this please ?

Thanks

๐Ÿ’ฌ Support request
Status

Postponed: needs info

Version

3.0

Component

Code

Created by

๐Ÿ‡ง๐Ÿ‡ชBelgium ananda

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡จ๐Ÿ‡ฟCzech Republic kyberman Czech Rep. ๐Ÿ‡จ๐Ÿ‡ฟ

    Hi everybody!

    For a very specific use case (e.g. a lot of nodes being updated at once), the queue processing shutdown could cause more trouble than leaving it workable and letting it process at least partially, I would say. In my case, CloudFlare API allows to purge 30 items at once, multiplied by 2000 possible requests daily. If the queue grows to over 100 000 items quickly, the purging process is stopped immediately. That means there are potentially around 60 000 items that could be processed before the CloudFlare API limit is exhausted.

    The idea this patch brings is to never stop the queue processing, but instead, there is an error logged after the queue grows to 30 000 items, so there is time to recognize and fix the possible issue. Could you please review and comment on this?

    This could be a settings/config/state/hook to override the default 100 000 items limit.
    Another idea is to enqueue the item only if it doesn't exist yet. Any thoughts?

    Thank you
    Vit

  • Status changed to Active over 1 year ago
  • achap ๐Ÿ‡ฆ๐Ÿ‡บ

    Just want to chime in and say this issue has affected me too when running migrations. We often run migrations that can take a few hours or more. I stepped through the code and think I discovered what's happening. In Drupal\purge\Plugin\Purge\Queue\QueueService::add invalidation tags are not added to the queue straight away but rather to an internal buffer. Then at the end of the request (For example a long running cron job or drush script) in Drupal\purge\Plugin\Purge\Queue\QueueService::destruct it looks like the items from the buffer are finally committed to the queue.

    The problem is, during the whole time the migration is running none of the invalidation tags that are generated by the migration can be processed by any of the purge processors. They are all dumped at once at the end of the migration which usually results in being over the 100k limit.

    Not sure what the fix is but that seems to be the root cause of the issue at least for us.

  • achap ๐Ÿ‡ฆ๐Ÿ‡บ

    Our workaround for the above was to re-architect our migration using the Queue API to process 1 item at a time, and give our queue worker a cron lease time of 1 hour (same as cron run interval). This way the buffer is emptied once per hour at least and it doesn't overwhelm the purge queue. Hope that helps someone.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada rbrownell Ottawa, Ontario, Canada

    This error baffles me. I understand that architecture is the normal solution, but it can't be if the business requirements of the project require timely and rapid updating of a large volume of nodes/pages.

    Please correct me if I am wrong, but it is my understanding that queues are supposed to help prevent server crashing by regulating the volume of data being sent to whatever system is receiving it. This would presumably occur in smaller batches instead of all at once. The fact that the queue stops processing after reaching a certain threshold suggests to me that the queue is not really a proper queue that processes things in smaller batches, but rather a dumping ground which is then sent all at once. There's got to be a better way of handling this than just stopping everything. There are mechanisms that can be added to reduce servers from crashing based on data volume.

  • Status changed to Needs review over 1 year ago
  • Open on Drupal.org โ†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    35:40
    35:40
    Queueing
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States japerry KVUO

    Typically this error is probably occuring if cron is misconfigured (or not configured), or during a migration or other process where lots of invalidations are happening at once.

    To counter this edge case, I added a new flag to the state system called purge.dangerous -- if you set this in settings or with drush sset purge.dangerous TRUE then you should be able to have the purger run with over 100,000 items in the queue.

  • To clear purge queue
    drush p-queue-empty

    Add processor
    drush p:processor-add drush_purge_queue_work

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand xurizaemon ลŒtepoti, Aotearoa ๐Ÿ

    That change mentioned in #19 should be available as of 8.x-3.5. The commit doesn't show in this issue as the commit omits the "Issue #3132524" subject. Looks like the fix was in 70b34944.

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand xurizaemon ลŒtepoti, Aotearoa ๐Ÿ

    We have a site that is periodically affected by this issue. When investigated, we observe that a single entry in purge_queue table has spiked beyond the 100K limit, which blocks all but manual queue flushes for future operation.

    select distinct max(item_id) sample_id, count(*) as count, from_unixtime(min(created)) as min_created, from_unixtime(max(created)) as max_created, data from purge_queue group by data order by count desc limit 10
    
    +-----------+---------+---------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------+
    | sample_id | count   | min_created         | max_created         | data                                                                                                                      |
    +-----------+---------+---------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------+
    |   9417901 | 1305685 | 2024-05-24 14:15:16 | 2024-06-16 21:46:18 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"config:views.view.media_library";i:3;a:0:{}}                                       |
    |   9417786 |    4944 | 2024-05-24 14:15:32 | 2024-06-16 21:45:09 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-org-sitemap";i:3;a:0:{}}                                  |
    |   9417791 |    4944 | 2024-05-24 14:15:32 | 2024-06-16 21:45:09 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-net-sitemap";i:3;a:0:{}}                                  |
    |   9417326 |    4686 | 2024-05-24 14:15:32 | 2024-06-16 21:40:22 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:34:"simple_sitemap:example-com-sitemap";i:3;a:0:{}}                                    |
    |   9417321 |    4381 | 2024-05-24 14:15:32 | 2024-06-16 21:40:22 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:26:"simple_sitemap:example-sitemap";i:3;a:0:{}}                                            |
    |   9416566 |    4275 | 2024-05-24 14:15:32 | 2024-06-16 21:35:17 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"simple_sitemap:example2-sitemap";i:3;a:0:{}}                                       |
    |   9417031 |     188 | 2024-05-26 20:36:28 | 2024-06-16 21:38:15 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:9:"node_list";i:3;a:0:{}}                                                              |
    |   9404071 |     176 | 2024-05-26 02:01:34 | 2024-06-16 19:45:47 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:19:"config:webform_list";i:3;a:0:{}}                                                   |
    |   9404076 |     176 | 2024-05-26 02:01:34 | 2024-06-16 19:45:47 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:23:"webform_submission_list";i:3;a:0:{}}                                               |
    

    If others are observing this issue, I'm interested to know if executing the query above on their site reveals a similar profile - ie that when grouped by data column, the entries in purge_queue are heavily dominated by a single value of data.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance o'briat Nantes

    Have a look to the patch of the "Deduplicate Queued Items" issue

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States mlncn Minneapolis, MN, USA

    As far as the error message, clearer problems could possibly be surfaced directlyโ€” presuming an unconfigured or misconfigured connection to the external CDN can be the cause of this?

    A recent example had, when spelunking down to the "Purge queue browser" by following one of the options for "Database" at the bottom of /admin/config/development/performance/purge, seven pages of "Failed" URLs and 115,820 pages of "New" URLs. Probably this situation of zero successes (if that is a correct reading) warrants a clear message about that?

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance o'briat Nantes

    The problem could also legitimately occurred when massive import/update batch are executed regularly.

    The module could provide an option that purge all cache (drush p:invalidate everything -y) and empty the queue (drush p:queue-empty) ?

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand xurizaemon ลŒtepoti, Aotearoa ๐Ÿ
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance o'briat Nantes

    The current "processing rate" could also be displayed to add a clear message, ex:

    "This can happen when no processors are clearing your queue, or when queueing outpaces processing. Please first solve the structural nature of the issue by adding processing power or reducing your queue loads. Empty the queue to unblock your system. The current processing rate (@current_rate clearing requests/s) is lower than the queue growth one (@growth_rate new item to clear/s)."

Production build 0.71.5 2024