your queue exceeded 100 000 items ! Purge shut down

Comment over 2 years ago →
🇨🇿Czech Republic kyberman Czech Rep. 🇨🇿
Hi everybody!

For a very specific use case (e.g. a lot of nodes being updated at once), the queue processing shutdown could cause more trouble than leaving it workable and letting it process at least partially, I would say. In my case, CloudFlare API allows to purge 30 items at once, multiplied by 2000 possible requests daily. If the queue grows to over 100 000 items quickly, the purging process is stopped immediately. That means there are potentially around 60 000 items that could be processed before the CloudFlare API limit is exhausted.

The idea this patch brings is to never stop the queue processing, but instead, there is an error logged after the queue grows to 30 000 items, so there is time to recognize and fix the possible issue. Could you please review and comment on this?

This could be a settings/config/state/hook to override the default 100 000 items limit.
Another idea is to enqueue the item only if it doesn't exist yet. Any thoughts?

Thank you
Vit
Status changed to Active over 2 years ago4:59am 12 July 2023
Comment over 2 years ago →
achap 🇦🇺
Just want to chime in and say this issue has affected me too when running migrations. We often run migrations that can take a few hours or more. I stepped through the code and think I discovered what's happening. In Drupal\purge\Plugin\Purge\Queue\QueueService::add invalidation tags are not added to the queue straight away but rather to an internal buffer. Then at the end of the request (For example a long running cron job or drush script) in Drupal\purge\Plugin\Purge\Queue\QueueService::destruct it looks like the items from the buffer are finally committed to the queue.

The problem is, during the whole time the migration is running none of the invalidation tags that are generated by the migration can be processed by any of the purge processors. They are all dumped at once at the end of the migration which usually results in being over the 100k limit.

Not sure what the fix is but that seems to be the root cause of the issue at least for us.
Comment over 2 years ago →
achap 🇦🇺
Our workaround for the above was to re-architect our migration using the Queue API to process 1 item at a time, and give our queue worker a cron lease time of 1 hour (same as cron run interval). This way the buffer is emptied once per hour at least and it doesn't overwhelm the purge queue. Hope that helps someone.
Comment about 2 years ago →
🇨🇦Canada rbrownell Gatineau, Québec, Canada
This error baffles me. I understand that architecture is the normal solution, but it can't be if the business requirements of the project require timely and rapid updating of a large volume of nodes/pages.

Please correct me if I am wrong, but it is my understanding that queues are supposed to help prevent server crashing by regulating the volume of data being sent to whatever system is receiving it. This would presumably occur in smaller batches instead of all at once. The fact that the queue stops processing after reaching a certain threshold suggests to me that the queue is not really a proper queue that processes things in smaller batches, but rather a dumping ground which is then sent all at once. There's got to be a better way of handling this than just stopping everything. There are mechanisms that can be added to reduce servers from crashing based on data volume.
Status changed to Needs review about 2 years ago2:03am 29 September 2023
Open on Drupal.org →
Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
40:39
40:39
Queueing
Comment about 2 years ago →
🇺🇸United States japerry KVUO
Typically this error is probably occuring if cron is misconfigured (or not configured), or during a migration or other process where lots of invalidations are happening at once.

To counter this edge case, I added a new flag to the state system called purge.dangerous -- if you set this in settings or with drush sset purge.dangerous TRUE then you should be able to have the purger run with over 100,000 items in the queue.
Comment almost 2 years ago →
sravalji
To clear purge queue
drush p-queue-empty

Add processor
drush p:processor-add drush_purge_queue_work
Comment almost 2 years ago →
🇳🇿New Zealand xurizaemon Ōtepoti, Aotearoa 🏝
That change mentioned in #19 should be available as of 8.x-3.5. The commit doesn't show in this issue as the commit omits the "Issue #3132524" subject. Looks like the fix was in 70b34944.

🇳🇿New Zealand xurizaemon Ōtepoti, Aotearoa 🏝

We have a site that is periodically affected by this issue. When investigated, we observe that a single entry in purge_queue table has spiked beyond the 100K limit, which blocks all but manual queue flushes for future operation.

select distinct max(item_id) sample_id, count(*) as count, from_unixtime(min(created)) as min_created, from_unixtime(max(created)) as max_created, data from purge_queue group by data order by count desc limit 10

+-----------+---------+---------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------+
| sample_id | count   | min_created         | max_created         | data                                                                                                                      |
+-----------+---------+---------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------+
|   9417901 | 1305685 | 2024-05-24 14:15:16 | 2024-06-16 21:46:18 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"config:views.view.media_library";i:3;a:0:{}}                                       |
|   9417786 |    4944 | 2024-05-24 14:15:32 | 2024-06-16 21:45:09 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-org-sitemap";i:3;a:0:{}}                                  |
|   9417791 |    4944 | 2024-05-24 14:15:32 | 2024-06-16 21:45:09 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-net-sitemap";i:3;a:0:{}}                                  |
|   9417326 |    4686 | 2024-05-24 14:15:32 | 2024-06-16 21:40:22 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:34:"simple_sitemap:example-com-sitemap";i:3;a:0:{}}                                    |
|   9417321 |    4381 | 2024-05-24 14:15:32 | 2024-06-16 21:40:22 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:26:"simple_sitemap:example-sitemap";i:3;a:0:{}}                                            |
|   9416566 |    4275 | 2024-05-24 14:15:32 | 2024-06-16 21:35:17 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"simple_sitemap:example2-sitemap";i:3;a:0:{}}                                       |
|   9417031 |     188 | 2024-05-26 20:36:28 | 2024-06-16 21:38:15 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:9:"node_list";i:3;a:0:{}}                                                              |
|   9404071 |     176 | 2024-05-26 02:01:34 | 2024-06-16 19:45:47 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:19:"config:webform_list";i:3;a:0:{}}                                                   |
|   9404076 |     176 | 2024-05-26 02:01:34 | 2024-06-16 19:45:47 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:23:"webform_submission_list";i:3;a:0:{}}                                               |

If others are observing this issue, I'm interested to know if executing the query above on their site reveals a similar profile - ie that when grouped by data column, the entries in purge_queue are heavily dominated by a single value of data.

Comment over 1 year ago →
🇮🇹Italy apaderno Brescia, 🇮🇹
Comment about 1 year ago →
🇫🇷France o'briat Nantes
Have a look to the patch of the "Deduplicate Queued Items" issue
Comment 9 months ago →
🇺🇸United States mlncn Minneapolis, MN, USA
As far as the error message, clearer problems could possibly be surfaced directly— presuming an unconfigured or misconfigured connection to the external CDN can be the cause of this?

A recent example had, when spelunking down to the "Purge queue browser" by following one of the options for "Database" at the bottom of /admin/config/development/performance/purge, seven pages of "Failed" URLs and 115,820 pages of "New" URLs. Probably this situation of zero successes (if that is a correct reading) warrants a clear message about that?
Comment 9 months ago →
🇫🇷France o'briat Nantes
The problem could also legitimately occurred when massive import/update batch are executed regularly.

The module could provide an option that purge all cache (drush p:invalidate everything -y) and empty the queue (drush p:queue-empty) ?
Comment 9 months ago →
🇳🇿New Zealand xurizaemon Ōtepoti, Aotearoa 🏝
Comment 9 months ago →
🇫🇷France o'briat Nantes
The current "processing rate" could also be displayed to add a clear message, ex:

"This can happen when no processors are clearing your queue, or when queueing outpaces processing. Please first solve the structural nature of the issue by adding processing power or reducing your queue loads. Empty the queue to unblock your system. The current processing rate (@current_rate clearing requests/s) is lower than the queue growth one (@growth_rate new item to clear/s)."

your queue exceeded 100 000 items ! Purge shut down

Comments & Activities