Transaction support for cache (tags) invalidation

Created on 31 August 2018, almost 6 years ago
Updated 15 February 2024, 5 months ago

Something else that is really important, but potentially can be solved in core itself for memcache's behalf:

#2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock β†’

Cache Tags can be invalidated whenever - even within transactions and that means we make old data valid again permanently - which means we serve outdated data.

e.g. Simple example:

Entity Cache:

[node:1] is written within a transaction, node tag is invalidated, but transaction is not yet committed.

Someone loads [node:1], the old node is loaded from the database and stored within memcache permanently.

Transaction is committed and from now on the new node should be shown. However it is not.

This can already happen with deletes (yes) and even with full bin flushes (yes, too).

But cache tags are everywhere in Drupal 8 and this makes the potential for race conditions much higher. And given that people get deadlocks on the DB cache tags implementation it means the race condition potential is real.

I think an after transaction commit that you can subscribe to would solve that as we could:

Invalidate the tags, and the queued deletions and the queued bin clears all at once.

Because we do not support strong DB isolation anyway (READ_COMMITTED is preferred) we do not even need to invalidate anything before transactions are committed [but we could of course do so as a best-effort for same-process reads of the data - though I am not a fan -- another possibility would be to set a different prefix for the cache during transaction (e.g. a random ID), so that the process has it's own cache key space to work with, invalidation are then propagated at the end of the transaction as usual ].

So unless someone disagrees, I think we should escalate #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock β†’ to a contributed project blocker and block a stable release on some solution getting into core for this.

Yes, Redis has the same problem, but I don't think it is good to ship something as stable, which has known race condition problems.

Especially if we have a solution that is not as complicated as cache_consistent.

πŸ› Bug report
Status

Needs review

Version

2.0

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany Fabianx

Live updates comments and jobs are added and updated live.
  • Needs manual testing

    The change/bugfix cannot be fully demonstrated by automated testing, and thus requires manual testing in a variety of environments.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡·πŸ‡ΊRussia skylord

    Tried #40 with "cache.backend.memcache_transaction_aware" as backend on site with huge traffic and slow DB (so the chance for race conditions is bigger) - no way, still the same behaviour. Node saving process started, cache is invalidated first, someone requests page, cache is populated with old data, then node actually saves in DB and we got stale cache until i manually invalidate appropriate cache tag using drush... With DB or Redis backend all is OK on that site!

  • Status changed to Needs work over 1 year ago
  • πŸ‡¦πŸ‡²Armenia arthur.baghdasar

    We are seeing same behaviour as per #41

  • Status changed to Needs review 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States weekbeforenext Asheville, NC

    The patch in comment #40 worked for me. I had reported an issue with the Diff module ( πŸ› Stale cache when latest revision is published (content moderation) Closed: won't fix ) which ended up being related to Memcached. I'm not sure of the overall implications of this patch and it's affects on performance, but it does resolve what I described in the related issue.

  • Status changed to Needs work 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States weekbeforenext Asheville, NC
  • Status changed to Needs review 5 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    We have been encountering what I suspect to be this issue. I was able to reproduce the same issue using the patch in 40 locally, and tracked it down to be stale data in the entity cache. When reset cache is called by entity storage, it is calling deleteMultiple which appeared to be happening immediately and not buffering until the transaction was complete.

    Looking through it, it looks like implementation of deleteMultiple was just missing from the transaction aware backend - added it and can no longer reproduce what I was encountering. Still doing some testing but it looks really good - thank you Josh!

    Would be interested if the deleteMultiple / entity cache issue is what was causing the continued issues in 36, 38, 41 and 43.

  • Merge request !22Draft: Transaction aware backend β†’ (Open) created by ericgsmith
  • Status changed to Needs work 5 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Move the MR and made an attempt at adding some tests.

    Unfortunately the existing MemcacheBackendTest is failing - and this test will have the exact same failures. I posted here https://www.drupal.org/project/memcache/issues/3397864#comment-15438674 πŸ“Œ Add optional DDEV dev environment Needs review but could not find an open issue where the tests are still being worked on.

    Ignoring the pre failing tests - the 2 new test methods I've added are very rough, they pass locally but I'm not sure what the state of CI will be. They need work, so leaving as needs work.

  • Pipeline finished with Failed
    5 months ago
    Total: 190s
    #95432
  • Pipeline finished with Failed
    5 months ago
    Total: 157s
    #96090
  • Pipeline finished with Failed
    5 months ago
    Total: 250s
    #96097
  • Pipeline finished with Failed
    5 months ago
    Total: 189s
    #96112
  • Status changed to Needs review 5 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Test failures for both MemcacheBackendTest and TransactionAwareMemcacheBackendTest both reduced by changing the tolerance to 0 for the timestamp invalidator.

    Setting back to needs review as I'm drifting into changing existing test failures here. The additional tests cover the get set and delete during transaction and with a rollback. This isn't full test coverage but I think with the existing failures its hard to press through without full confidence in them. I think getting the existing tests all passing in CI in a different issue will make this easier to review.

    I'm still very interested in if the fix in #45 will resolve the errors reported in #36, #38, #41 and #43.

  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Adding πŸ“Œ Test coverage Active as a related issue to track existing issues with test failures.

Production build 0.69.0 2024