Transaction support for cache (tags) invalidation

Created on 31 August 2018, about 6 years ago
Updated 15 February 2023, almost 2 years ago

Something else that is really important, but potentially can be solved in core itself for memcache's behalf:

#2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock β†’

Cache Tags can be invalidated whenever - even within transactions and that means we make old data valid again permanently - which means we serve outdated data.

e.g. Simple example:

Entity Cache:

[node:1] is written within a transaction, node tag is invalidated, but transaction is not yet committed.

Someone loads [node:1], the old node is loaded from the database and stored within memcache permanently.

Transaction is committed and from now on the new node should be shown. However it is not.

This can already happen with deletes (yes) and even with full bin flushes (yes, too).

But cache tags are everywhere in Drupal 8 and this makes the potential for race conditions much higher. And given that people get deadlocks on the DB cache tags implementation it means the race condition potential is real.

I think an after transaction commit that you can subscribe to would solve that as we could:

Invalidate the tags, and the queued deletions and the queued bin clears all at once.

Because we do not support strong DB isolation anyway (READ_COMMITTED is preferred) we do not even need to invalidate anything before transactions are committed [but we could of course do so as a best-effort for same-process reads of the data - though I am not a fan -- another possibility would be to set a different prefix for the cache during transaction (e.g. a random ID), so that the process has it's own cache key space to work with, invalidation are then propagated at the end of the transaction as usual ].

So unless someone disagrees, I think we should escalate #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock β†’ to a contributed project blocker and block a stable release on some solution getting into core for this.

Yes, Redis has the same problem, but I don't think it is good to ship something as stable, which has known race condition problems.

Especially if we have a solution that is not as complicated as cache_consistent.

πŸ› Bug report
Status

Needs review

Version

2.0

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany Fabianx

Live updates comments and jobs are added and updated live.
  • Needs manual testing

    The change/bugfix cannot be fully demonstrated by automated testing, and thus requires manual testing in a variety of environments.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡·πŸ‡ΊRussia skylord

    Tried #40 with "cache.backend.memcache_transaction_aware" as backend on site with huge traffic and slow DB (so the chance for race conditions is bigger) - no way, still the same behaviour. Node saving process started, cache is invalidated first, someone requests page, cache is populated with old data, then node actually saves in DB and we got stale cache until i manually invalidate appropriate cache tag using drush... With DB or Redis backend all is OK on that site!

  • Status changed to Needs work over 1 year ago
  • πŸ‡¦πŸ‡²Armenia arthur.baghdasar

    We are seeing same behaviour as per #41

  • Status changed to Needs review over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States weekbeforenext Asheville, NC

    The patch in comment #40 worked for me. I had reported an issue with the Diff module ( πŸ› Stale cache when latest revision is published (content moderation) Closed: won't fix ) which ended up being related to Memcached. I'm not sure of the overall implications of this patch and it's affects on performance, but it does resolve what I described in the related issue.

  • Status changed to Needs work over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States weekbeforenext Asheville, NC
  • Status changed to Needs review 10 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    We have been encountering what I suspect to be this issue. I was able to reproduce the same issue using the patch in 40 locally, and tracked it down to be stale data in the entity cache. When reset cache is called by entity storage, it is calling deleteMultiple which appeared to be happening immediately and not buffering until the transaction was complete.

    Looking through it, it looks like implementation of deleteMultiple was just missing from the transaction aware backend - added it and can no longer reproduce what I was encountering. Still doing some testing but it looks really good - thank you Josh!

    Would be interested if the deleteMultiple / entity cache issue is what was causing the continued issues in 36, 38, 41 and 43.

  • Merge request !22Draft: Transaction aware backend β†’ (Open) created by ericgsmith
  • Status changed to Needs work 9 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Move the MR and made an attempt at adding some tests.

    Unfortunately the existing MemcacheBackendTest is failing - and this test will have the exact same failures. I posted here https://www.drupal.org/project/memcache/issues/3397864#comment-15438674 πŸ“Œ Add optional DDEV dev environment Needs review but could not find an open issue where the tests are still being worked on.

    Ignoring the pre failing tests - the 2 new test methods I've added are very rough, they pass locally but I'm not sure what the state of CI will be. They need work, so leaving as needs work.

  • Pipeline finished with Failed
    9 months ago
    Total: 190s
    #95432
  • Pipeline finished with Failed
    9 months ago
    Total: 157s
    #96090
  • Pipeline finished with Failed
    9 months ago
    Total: 250s
    #96097
  • Pipeline finished with Failed
    9 months ago
    Total: 189s
    #96112
  • Status changed to Needs review 9 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Test failures for both MemcacheBackendTest and TransactionAwareMemcacheBackendTest both reduced by changing the tolerance to 0 for the timestamp invalidator.

    Setting back to needs review as I'm drifting into changing existing test failures here. The additional tests cover the get set and delete during transaction and with a rollback. This isn't full test coverage but I think with the existing failures its hard to press through without full confidence in them. I think getting the existing tests all passing in CI in a different issue will make this easier to review.

    I'm still very interested in if the fix in #45 will resolve the errors reported in #36, #38, #41 and #43.

  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Adding πŸ“Œ Test coverage Active as a related issue to track existing issues with test failures.

  • πŸ‡ΊπŸ‡ΈUnited States btully

    I can confirm that patch #45 appears to have solved the issue of stale content for a site using considerable nested entities (paragraphs), using cache.backend.memcache as their default backend and where a node edit form might take 10 seconds or more to submit while the node being edited is receiving traffic.

    To reproduce the issue I developed a simple JMeter test to request the test page as an anonymous user with a cachebusing query string. The test runs with 10 concurrent users.

    While the load test is running, I manually edit the test page, which contains paragraphs. I edit one of the paragraph components and update the title and kicker fields. I submit the node edit form.

    Prior to the patch, the node edit form would submit, but the embedded paragraph form fields that were updated do not appear on page load AND/OR when reloading the node edit page. It seems as though the race condition of a simultaneous anonymouse user request for the page has memcache get the old/stale values before they can be deleted and populated with the new values.

    After applying the patch and changing the default cache backend from cache.backend.memcache to cache.backend.memcache_transaction_aware, and clearing Drupal and Varnish cache, I repeated my reproducible steps.

    With the patch applied, I immediately saw the new values for both fields as both an authenticated and anonymous user. Reloading the node edit form also showed the updated values.

    I went through the reproducible steps 3 times, and all 3 times the new values were displaying, which leads me to believe this patch (#45) is working as expected!

  • πŸ‡ΊπŸ‡ΈUnited States btully

    I take my last comment back. While the patch appeared to be working for about a week or so, I am suddenly seeing the following error even though the service and FactoryInterface are present in the codebase:

    You have requested a non-existent service "cache.backend.memcache_transaction_aware".

    Is there a specific place the `cache.backend.memcache_transaction_aware` needs to be defined? I have the following in my in docroot/sites/settings/global.settings.php:

    $settings['cache']['default'] = 'cache.backend.memcache_transaction_aware';

    Like I said, it was working for about a week, but with a recent deployment that changed the code to a branch without the patch, now when we switch the code back to the branch with the patch (and settings) we are getting the above error.

    Is there someplace else the new service needs to be explicitly defined/injected?

    Memcache module is enabled and I can see the new cache.backend.memcache_transaction_aware service in memcache.services.yml

    But for some reason, even after a cache rebuild, Drupal is saying the service isn't defined.

    Could opcache be to blame?

Production build 0.71.5 2024