Deduplicate Queued Items

Created on 12 February 2017, almost 8 years ago
Updated 5 June 2024, 6 months ago

If I edit 1 page multiple times before the purge queue has been processed I can end up with duplicate invalidations in the queue.

When an item is enqueued it should be checked for dups. If a duplicate exists it should not be enqueued.

Feature request
Status

Active

Version

3.0

Component

Code

Created by

🇺🇸United States adam.weingarten

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇳🇿New Zealand RoSk0 Wellington

    Thanks a lot for the purge_queues module Jonathan!

    That's a real game changer! My queue was growing to millions over-pacing purge cron job running every minute. Local tests are great , will see what it would look like on prod.

    I believe that purge_queues module could be a great addition to the purge itself.

  • 🇳🇿New Zealand ericgsmith

    We have been investigating performance issues caused by duplicate items when using purge in combination with purge_queuer_url module.

    We have encountered issues in 2 areas - 1. duplicates in the buffer and 2. duplicates in the queue.

    Duplicate items in the buffer

    I can see that when an invalidation is created in the InvalidationsService it is using a instanceCounter to generate a unique integer ID for the invalidation object. When added to the buffer the buffer is calling has to see if that ID has been added to the buffer already.

    Queuers seem to make some attempt to reduce duplicates, e.g by filtering out previously requested tags - but certain situations such as config importing can trigger thousands of duplicates into the buffer, which can lead to high memory consumption.

    While I have been looking at this through the context of just the url/path queuer - I wonder if it would be possible for the queuers themselves could set either an id or another property on the invalidation that can be used to dedupe it. E.g - the url registry maintains a list of urls, so the url id could be considered unique. Individual cache tags could also consider themselves unique. Possibly other plugins may have difficulty determining their uniqueness, but opening up the possibility to set id or fallback to an instance counter could help plugins where this is problematic (e.g. the url queuer) to be more efficient.

    Without looking through all the code, I would be interested in the maintainers thoughts as it appears the use of getId on the invalidation plugin is (according to my IDE) mainly through the buffer and tests.

    Would there be any reasons against

    1. changing id getId in InvalidationInterface return type to be string
    2. introduce a third optional parameter InvalidationService->get to allow an ID to be provided when created
    3. introduce fallback behaviour for a unqiue id to be generated if not provided

    That would then allow queuers to make changes to provide a unique value when creating an invalidation, and the existing buffer deduping code may not need to change.

    Duplicate items in the queue

    We are using the module @jonhattan provided - but the checks for duplicate items can be problematic for repeated large updates (e.g in our case it was multiple batch calls that each invalidated the media_list tag)

    @RoSk0 raised an idea (offline) of storing an unique identifier for a queued item to make use of upsert queries instead of insert queries using a database queue. We have a proof of concept doing using by hashing the type and expression value of the data, but it would be easier with an enforced / persisted unique ID for an invalidation item. We would be interested in any thoughts on this approach.

  • 🇫🇷France O'Briat Nantes

    I confirm that duplicated invalidation occur when Drupal is importing or update regularly large volume of content.

    A simple solution could be to delete all identical "data" when purging an item?

    Or just add a global duplicate deletion at the end of every purger, here's some pseudo code:

    "SELECT MAX(item_id), data FROM purge_queue  GROUP BY data HAVING COUNT(*) > 1"
    foreach item_id, data
     DELETE FROM purge_queue  WHERE data=$data AND item_id  != item_id
    
    
  • 🇮🇳India Santhoshkumar

    We have identified similar kind of issue when using purge_queuer_coretags module, there are 2 issues we identified as below

    1. Same cachetags inserted into purge_queue table multiple times.
    2. Due to duplicated tags inserted into purge_queueu table we facing your queue exceeded 100 000 items ! Purge shut down issue frequently.

    To fix the issue we have added the patch duplicate_purge_tags.patch, In this patch we have DB lookup before insert into purge_queue also maintained the array in static array to prevent multiple database calls for same tag.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.2.x + Environment: PHP 8.1 & MySQL 5.7
    last update 6 months ago
    489 pass, 22 fail
Production build 0.71.5 2024