- ๐จ๐ฟCzech Republic kyberman Czech Rep. ๐จ๐ฟ
Hi everybody!
For a very specific use case (e.g. a lot of nodes being updated at once), the queue processing shutdown could cause more trouble than leaving it workable and letting it process at least partially, I would say. In my case, CloudFlare API allows to purge 30 items at once, multiplied by 2000 possible requests daily. If the queue grows to over 100 000 items quickly, the purging process is stopped immediately. That means there are potentially around 60 000 items that could be processed before the CloudFlare API limit is exhausted.
The idea this patch brings is to never stop the queue processing, but instead, there is an error logged after the queue grows to 30 000 items, so there is time to recognize and fix the possible issue. Could you please review and comment on this?
This could be a settings/config/state/hook to override the default 100 000 items limit.
Another idea is to enqueue the item only if it doesn't exist yet. Any thoughts?Thank you
Vit - Status changed to Active
over 1 year ago 4:59am 12 July 2023 - achap ๐ฆ๐บ
Just want to chime in and say this issue has affected me too when running migrations. We often run migrations that can take a few hours or more. I stepped through the code and think I discovered what's happening. In
Drupal\purge\Plugin\Purge\Queue\QueueService::add
invalidation tags are not added to the queue straight away but rather to an internal buffer. Then at the end of the request (For example a long running cron job or drush script) inDrupal\purge\Plugin\Purge\Queue\QueueService::destruct
it looks like the items from the buffer are finally committed to the queue.The problem is, during the whole time the migration is running none of the invalidation tags that are generated by the migration can be processed by any of the purge processors. They are all dumped at once at the end of the migration which usually results in being over the 100k limit.
Not sure what the fix is but that seems to be the root cause of the issue at least for us.
- achap ๐ฆ๐บ
Our workaround for the above was to re-architect our migration using the Queue API to process 1 item at a time, and give our queue worker a cron lease time of 1 hour (same as cron run interval). This way the buffer is emptied once per hour at least and it doesn't overwhelm the purge queue. Hope that helps someone.
- ๐จ๐ฆCanada rbrownell Ottawa, Ontario, Canada
This error baffles me. I understand that architecture is the normal solution, but it can't be if the business requirements of the project require timely and rapid updating of a large volume of nodes/pages.
Please correct me if I am wrong, but it is my understanding that queues are supposed to help prevent server crashing by regulating the volume of data being sent to whatever system is receiving it. This would presumably occur in smaller batches instead of all at once. The fact that the queue stops processing after reaching a certain threshold suggests to me that the queue is not really a proper queue that processes things in smaller batches, but rather a dumping ground which is then sent all at once. There's got to be a better way of handling this than just stopping everything. There are mechanisms that can be added to reduce servers from crashing based on data volume.
- Status changed to Needs review
about 1 year ago 2:03am 29 September 2023 - Open on Drupal.org โCore: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
49:58 49:58 Queueing - ๐บ๐ธUnited States japerry KVUO
Typically this error is probably occuring if cron is misconfigured (or not configured), or during a migration or other process where lots of invalidations are happening at once.
To counter this edge case, I added a new flag to the state system called
purge.dangerous
-- if you set this in settings or withdrush sset purge.dangerous TRUE
then you should be able to have the purger run with over 100,000 items in the queue. To clear purge queue
drush p-queue-emptyAdd processor
drush p:processor-add drush_purge_queue_work- ๐ณ๐ฟNew Zealand xurizaemon ลtepoti, Aotearoa ๐
That change mentioned in #19 should be available as of 8.x-3.5. The commit doesn't show in this issue as the commit omits the "Issue #3132524" subject. Looks like the fix was in 70b34944.
- ๐ณ๐ฟNew Zealand xurizaemon ลtepoti, Aotearoa ๐
We have a site that is periodically affected by this issue. When investigated, we observe that a single entry in
purge_queue
table has spiked beyond the 100K limit, which blocks all but manual queue flushes for future operation.select distinct max(item_id) sample_id, count(*) as count, from_unixtime(min(created)) as min_created, from_unixtime(max(created)) as max_created, data from purge_queue group by data order by count desc limit 10
+-----------+---------+---------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------+ | sample_id | count | min_created | max_created | data | +-----------+---------+---------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------+ | 9417901 | 1305685 | 2024-05-24 14:15:16 | 2024-06-16 21:46:18 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"config:views.view.media_library";i:3;a:0:{}} | | 9417786 | 4944 | 2024-05-24 14:15:32 | 2024-06-16 21:45:09 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-org-sitemap";i:3;a:0:{}} | | 9417791 | 4944 | 2024-05-24 14:15:32 | 2024-06-16 21:45:09 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-net-sitemap";i:3;a:0:{}} | | 9417326 | 4686 | 2024-05-24 14:15:32 | 2024-06-16 21:40:22 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:34:"simple_sitemap:example-com-sitemap";i:3;a:0:{}} | | 9417321 | 4381 | 2024-05-24 14:15:32 | 2024-06-16 21:40:22 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:26:"simple_sitemap:example-sitemap";i:3;a:0:{}} | | 9416566 | 4275 | 2024-05-24 14:15:32 | 2024-06-16 21:35:17 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"simple_sitemap:example2-sitemap";i:3;a:0:{}} | | 9417031 | 188 | 2024-05-26 20:36:28 | 2024-06-16 21:38:15 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:9:"node_list";i:3;a:0:{}} | | 9404071 | 176 | 2024-05-26 02:01:34 | 2024-06-16 19:45:47 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:19:"config:webform_list";i:3;a:0:{}} | | 9404076 | 176 | 2024-05-26 02:01:34 | 2024-06-16 19:45:47 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:23:"webform_submission_list";i:3;a:0:{}} |
If others are observing this issue, I'm interested to know if executing the query above on their site reveals a similar profile - ie that when grouped by
data
column, the entries inpurge_queue
are heavily dominated by a single value ofdata
. - ๐ซ๐ทFrance O'Briat Nantes
Have a look to the patch of the "Deduplicate Queued Items" issue