Excessive Tag Hash Collisions

Created on 13 November 2023, about 1 year ago
Updated 29 April 2024, 7 months ago

Problem/Motivation

We are experiencing excessive cache clearing due to hashed cache tag collisions on this module. It appears that each tag sent to Cloudflare is 3 characters long and each character is a hexadecimal (i.e. 16 possible characters in each one). This means there are only 16^3 = 4096 possible hashes that can be sent to Cloudflare and the probability of collisions is high.

Steps to reproduce

Example: The tags 'file_list' and 'config:system.menu.language' result in the same hash '144'.

Proposed resolution

We found that changing the number of possible characters to 36 (26 alpha plus 10 numeric digits) and increasing the length of the hash to 4 alleviated the problem for us. It results in ~1.6 million possible hashes which reduces the chances for collisions considerably.

However, this increases the size of the header. We have not run into any issues with it yet but we also added a config setting that allows us to remove any tags from the list based on prefix (e.g. config tags) and this decreases the header size.

πŸ› Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States kleinmp

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @kleinmp
  • πŸ‡ΊπŸ‡ΈUnited States kleinmp

    Attaching patch.

  • Status changed to RTBC 12 months ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.4 + Environment: PHP 8.1 & MySQL 5.7
    last update 12 months ago
    Composer require failure
  • πŸ‡ΊπŸ‡ΈUnited States Jody Lynn

    Confirmed that we are running this patch in production and our Cloudflare cache hit rate increased significantly

  • πŸ‡¦πŸ‡ΊAustralia almunnings Melbourne, πŸ‡¦πŸ‡Ί

    This patch has worked well us.
    We were seeing excessive collisions across entities, and Cloudflare was invalidating completely unrelated content.

    This patch is excellent

  • πŸ‡¬πŸ‡§United Kingdom altcom_neil

    Hi

    We also ran into this issue and another related issue - cache tags in the same Cloudflare account will clear all environments - so the UAT sites cache tags will clear the production sites cache if you are using the same account during development. We have added a patch that allows you to prefix the cache tag with an environment character so that cache tags are unique per environment.
    See https://www.drupal.org/project/cloudflare/issues/3394651 ✨ Add optional Environment setting Needs review

    In that code we increased the length of the hashed cache tag (before adding the environment character) to 6 characters (giving 16.7 million unique codes) - as we didn't spot the better improvement of using the larger character set that you have used here. We have been using this code on sites with in the excess of 100,000 nodes and we haven't run into any header size issues so 4 character tags should be fine.
    If you do use 6 characters in the hash then you are up to over 2 billion unique hashes!

    Should the length of the hash value be a config value - with a minimum of 4 so that it can be configured on a site-by-site basis? Very, very, very, very (etc) large sites would potentially have more than 1.6 million cache tags if they have millions of entities?

    Cheers, Neil

  • Status changed to Fixed 7 months ago
  • πŸ‡¨πŸ‡¦Canada mandclu

    Thanks @kleinmp for identifying this, and for providing a fix. Merged in.

    • mandclu β†’ committed bbfb0a7d on 2.0.x
      Issue #3401335 by kleinmp, mandclu: Excessive Tag Hash Collisions
      
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024