Cache tags collision

Created on 15 February 2023, almost 2 years ago
Updated 27 May 2024, 6 months ago

Problem/Motivation

The Cache tags generated/processed by the dropsolid_purge are hashed using MD5 and then use the first 4 chars.

As documented on the module there is a high collision risk.

We have found cases where different tags can have the same hash, meaning that non related cached pages will get invalidated.

  • Ex. a file:xxx tag having the same hash as node:xxx tag
  • Once the file is invalidated, the unrelated node will also be invalidated

Steps to reproduce

  • Log tags and hashes on the cacheTags method from dropsolid_purge/src/Hash.php

Proposed resolution

There is a thin line between allowing more chars in the hashed tag (to avoid collisions) and headers size.

  • By increasing 2 chars, we are increasing by 50% the specific tags header

Options

  • Do an analysis to check how many chars we need to increase to have less collision vs the header size performance hit.
  • Review hashing algorithm.
  • Prefix the hash with a char by adding some logic to group the tags.
✨ Feature request
Status

Needs review

Version

1.0

Component

Code

Created by

πŸ‡΅πŸ‡ΉPortugal fmfpereira

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024