Transaction support for cache (tags) invalidation

Created on 31 August 2018, over 6 years ago
Updated 15 February 2023, almost 2 years ago

Something else that is really important, but potentially can be solved in core itself for memcache's behalf:

#2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock

Cache Tags can be invalidated whenever - even within transactions and that means we make old data valid again permanently - which means we serve outdated data.

e.g. Simple example:

Entity Cache:

[node:1] is written within a transaction, node tag is invalidated, but transaction is not yet committed.

Someone loads [node:1], the old node is loaded from the database and stored within memcache permanently.

Transaction is committed and from now on the new node should be shown. However it is not.

This can already happen with deletes (yes) and even with full bin flushes (yes, too).

But cache tags are everywhere in Drupal 8 and this makes the potential for race conditions much higher. And given that people get deadlocks on the DB cache tags implementation it means the race condition potential is real.

I think an after transaction commit that you can subscribe to would solve that as we could:

Invalidate the tags, and the queued deletions and the queued bin clears all at once.

Because we do not support strong DB isolation anyway (READ_COMMITTED is preferred) we do not even need to invalidate anything before transactions are committed [but we could of course do so as a best-effort for same-process reads of the data - though I am not a fan -- another possibility would be to set a different prefix for the cache during transaction (e.g. a random ID), so that the process has it's own cache key space to work with, invalidation are then propagated at the end of the transaction as usual ].

So unless someone disagrees, I think we should escalate #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock to a contributed project blocker and block a stable release on some solution getting into core for this.

Yes, Redis has the same problem, but I don't think it is good to ship something as stable, which has known race condition problems.

Especially if we have a solution that is not as complicated as cache_consistent.

🐛 Bug report
Status

Needs review

Version

2.0

Component

Code

Created by

🇩🇪Germany Fabianx

Live updates comments and jobs are added and updated live.
  • Needs manual testing

    The change/bugfix cannot be fully demonstrated by automated testing, and thus requires manual testing in a variety of environments.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇷🇺Russia skylord

    Tried #40 with "cache.backend.memcache_transaction_aware" as backend on site with huge traffic and slow DB (so the chance for race conditions is bigger) - no way, still the same behaviour. Node saving process started, cache is invalidated first, someone requests page, cache is populated with old data, then node actually saves in DB and we got stale cache until i manually invalidate appropriate cache tag using drush... With DB or Redis backend all is OK on that site!

  • Status changed to Needs work almost 2 years ago
  • 🇦🇲Armenia arthur.baghdasar

    We are seeing same behaviour as per #41

  • Status changed to Needs review over 1 year ago
  • 🇺🇸United States weekbeforenext Asheville, NC

    The patch in comment #40 worked for me. I had reported an issue with the Diff module ( 🐛 Stale cache when latest revision is published (content moderation) Closed: won't fix ) which ended up being related to Memcached. I'm not sure of the overall implications of this patch and it's affects on performance, but it does resolve what I described in the related issue.

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States weekbeforenext Asheville, NC
  • Status changed to Needs review 12 months ago
  • 🇳🇿New Zealand ericgsmith

    We have been encountering what I suspect to be this issue. I was able to reproduce the same issue using the patch in 40 locally, and tracked it down to be stale data in the entity cache. When reset cache is called by entity storage, it is calling deleteMultiple which appeared to be happening immediately and not buffering until the transaction was complete.

    Looking through it, it looks like implementation of deleteMultiple was just missing from the transaction aware backend - added it and can no longer reproduce what I was encountering. Still doing some testing but it looks really good - thank you Josh!

    Would be interested if the deleteMultiple / entity cache issue is what was causing the continued issues in 36, 38, 41 and 43.

  • Merge request !22Draft: Transaction aware backend → (Open) created by ericgsmith
  • Status changed to Needs work 11 months ago
  • 🇳🇿New Zealand ericgsmith

    Move the MR and made an attempt at adding some tests.

    Unfortunately the existing MemcacheBackendTest is failing - and this test will have the exact same failures. I posted here https://www.drupal.org/project/memcache/issues/3397864#comment-15438674 📌 Add optional DDEV dev environment Needs review but could not find an open issue where the tests are still being worked on.

    Ignoring the pre failing tests - the 2 new test methods I've added are very rough, they pass locally but I'm not sure what the state of CI will be. They need work, so leaving as needs work.

  • Pipeline finished with Failed
    11 months ago
    Total: 190s
    #95432
  • Pipeline finished with Failed
    11 months ago
    Total: 157s
    #96090
  • Pipeline finished with Failed
    11 months ago
    Total: 250s
    #96097
  • Pipeline finished with Failed
    11 months ago
    Total: 189s
    #96112
  • Status changed to Needs review 11 months ago
  • 🇳🇿New Zealand ericgsmith

    Test failures for both MemcacheBackendTest and TransactionAwareMemcacheBackendTest both reduced by changing the tolerance to 0 for the timestamp invalidator.

    Setting back to needs review as I'm drifting into changing existing test failures here. The additional tests cover the get set and delete during transaction and with a rollback. This isn't full test coverage but I think with the existing failures its hard to press through without full confidence in them. I think getting the existing tests all passing in CI in a different issue will make this easier to review.

    I'm still very interested in if the fix in #45 will resolve the errors reported in #36, #38, #41 and #43.

  • 🇳🇿New Zealand ericgsmith

    Adding 📌 Test coverage Active as a related issue to track existing issues with test failures.

  • 🇺🇸United States btully Dutchess County, NY

    I can confirm that patch #45 appears to have solved the issue of stale content for a site using considerable nested entities (paragraphs), using cache.backend.memcache as their default backend and where a node edit form might take 10 seconds or more to submit while the node being edited is receiving traffic.

    To reproduce the issue I developed a simple JMeter test to request the test page as an anonymous user with a cachebusing query string. The test runs with 10 concurrent users.

    While the load test is running, I manually edit the test page, which contains paragraphs. I edit one of the paragraph components and update the title and kicker fields. I submit the node edit form.

    Prior to the patch, the node edit form would submit, but the embedded paragraph form fields that were updated do not appear on page load AND/OR when reloading the node edit page. It seems as though the race condition of a simultaneous anonymouse user request for the page has memcache get the old/stale values before they can be deleted and populated with the new values.

    After applying the patch and changing the default cache backend from cache.backend.memcache to cache.backend.memcache_transaction_aware, and clearing Drupal and Varnish cache, I repeated my reproducible steps.

    With the patch applied, I immediately saw the new values for both fields as both an authenticated and anonymous user. Reloading the node edit form also showed the updated values.

    I went through the reproducible steps 3 times, and all 3 times the new values were displaying, which leads me to believe this patch (#45) is working as expected!

  • 🇺🇸United States btully Dutchess County, NY

    I take my last comment back. While the patch appeared to be working for about a week or so, I am suddenly seeing the following error even though the service and FactoryInterface are present in the codebase:

    You have requested a non-existent service "cache.backend.memcache_transaction_aware".

    Is there a specific place the `cache.backend.memcache_transaction_aware` needs to be defined? I have the following in my in docroot/sites/settings/global.settings.php:

    $settings['cache']['default'] = 'cache.backend.memcache_transaction_aware';

    Like I said, it was working for about a week, but with a recent deployment that changed the code to a branch without the patch, now when we switch the code back to the branch with the patch (and settings) we are getting the above error.

    Is there someplace else the new service needs to be explicitly defined/injected?

    Memcache module is enabled and I can see the new cache.backend.memcache_transaction_aware service in memcache.services.yml

    But for some reason, even after a cache rebuild, Drupal is saying the service isn't defined.

    Could opcache be to blame?

  • 🇮🇳India suresh7787

    Hi All,

    I have a quick question regarding this. I applied the patch to my site, which is highly transactional, and it successfully resolved the stale content issue. However, I occasionally encounter cache corruption (cause unknown), and I suspect it might be due to a race condition or something similar.

    When I reviewed the cache bins and their associated classes, I noticed that critical bins like `bootstrap`, `discovery`, and `config` are still using the default Memcache backend instead of the transaction-aware backend.

    The cache corruption predominantly affects `config` and `discovery`, and clearing the cache resolves the issue temporarily. However, the problem reoccurs over time.

    Could you please advise if moving these critical bins to the transaction-aware backend would help prevent this cache corruption?

    Thanks!

  • Do We need to make any change in bootstrap container?

    $memcache_exists = class_exists('Memcache', FALSE);
    $memcached_exists = class_exists('Memcached', FALSE);
    if ($memcache_exists || $memcached_exists) {
    $class_loader->addPsr4('Drupal\\memcache\\', $app_root . '/modules/composer/memcache/src');

    // Define custom bootstrap container definition to use Memcache for cache.container.
    $settings['bootstrap_container_definition'] = [
    'parameters' => [],
    'services' => [
    'database' => [
    'class' => 'Drupal\Core\Database\Connection',
    'factory' => 'Drupal\Core\Database\Database::getConnection',
    'arguments' => ['default'],
    ],
    'settings' => [
    'class' => 'Drupal\Core\Site\Settings',
    'factory' => 'Drupal\Core\Site\Settings::getInstance',
    ],
    'memcache.settings' => [
    'class' => 'Drupal\memcache\MemcacheSettings',
    'arguments' => ['@settings'],
    ],
    'memcache.factory' => [
    'class' => 'Drupal\memcache\Driver\MemcacheDriverFactory',
    'arguments' => ['@memcache.settings'],
    ],
    'memcache.timestamp.invalidator.bin' => [
    'class' => 'Drupal\memcache\Invalidator\MemcacheTimestampInvalidator',
    # Adjust tolerance factor as appropriate when not running memcache on localhost.
    'arguments' => ['@memcache.factory', 'memcache_bin_timestamps', 0.001],
    ],
    'memcache.backend.cache.container' => [
    'class' => 'Drupal\memcache\DrupalMemcacheInterface',
    'factory' => ['@memcache.factory', 'get'],
    'arguments' => ['container'],
    ],
    'cache_tags_provider.container' => [
    'class' => 'Drupal\Core\Cache\DatabaseCacheTagsChecksum',
    'arguments' => ['@database'],
    ],
    'cache.container' => [
    'class' => 'Drupal\memcache\MemcacheBackend',
    'arguments' => ['container', '@memcache.backend.cache.container', '@cache_tags_provider.container', '@memcache.timestamp.invalidator.bin'],
    ],
    ],
    ];
    }

Production build 0.71.5 2024