Add a grace period to ChainedFastBackend when last_write_timestamp is ahead

Created on 22 May 2025, 9 days ago

Problem/Motivation

After 📌 Optimize last_write_timestamp writes in ChainedFastBackend Active we write the last_write_timestamp in chained fast backend 50ms ahead when the slower cache backend is written to. This allows us to avoid writing the timestamp on every single cache write (which could be thousands of times per second on cold starts - think every field storage + field config entity etc.).

However, we still fetch items from the fast backend even if there's zero chance they're going to be accepted, and we also write items back to the fast backend even if their created time is going to be older than the $last_write_timestamp.

Since this case will by definition happen when there's a cache stampede, it could be one of the causes of APCu locks as described in 📌 Enhancing APCu Lock Efficiency (for ChainedFast for High-Concurrency Cache Invalidation) Needs work .

Steps to reproduce

Proposed resolution

In ChainedFastBackend::getMultiple(), don't even bother looking up items from the fast backend or writing back to it, if $last_write_timestamp is >= now.

We could also consider adding an additional grace period, e.g. making the comparison >= $now + 1 (second) when, so that the fast backend is only consulted and written to if the slow backend hasn't had a write in the last second.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

📌 Task
Status

Active

Version

11.0 🔥

Component

cache system

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • 🇬🇧United Kingdom catch

    Pushed what I think we need.

    Didn't implement the 'additional grace period' because I'm not sure that's what we want. If there are lots and lots of cache sets/invalidations happening, then the $last_write_timestamp will keep being set ahead, and as soon as it's not, that's when we want to write to the fast backend, so what's here might be enough.

  • 🇬🇧United Kingdom catch

    Moving to needs review.

    Adding needs profiling because I think that running xhprof on a page request immediately after a cache clear should hopefully show a clear before/after here - e.g. there should be dozens or hundreds less apcu reads and writes but almost no other changes. And then after refreshing that page a couple of times, everything should be the same with/without the MR applied.

  • Pipeline finished with Failed
    9 days ago
    Total: 715s
    #504000
  • 🇬🇧United Kingdom catch
  • Pipeline finished with Failed
    9 days ago
    Total: 183s
    #504018
  • 🇩🇪Germany Fabianx

    Looks great to me!

  • 🇬🇧United Kingdom catch

    Removing the needs profiling tag because by happy accident this made existing unit tests fail, which gave me an idea how we should unit test this - unit test proves that this skips the fast backend when the timestamp is in the future.

    However it would probably still be nice to profile it to actually show the real difference in apcu reads/writes.

  • Pipeline finished with Success
    9 days ago
    Total: 440s
    #504039
  • 🇬🇧United Kingdom catch

    There's another potential benefit here although it will depend on what actually happens in real-life situations.

    Field config entities (for example) aren't usually requested at all when caches are warm, they're used to build things like the views data cache, which then gets cached separately. See 🐛 ViewsData should not cache by language Needs work and 📌 Allow fields to be opted out of views data Active for some background.

    If those higher level caches are built and written during the 'grace period' added here, we won't write field config entities to the fast backend during that time, and then by the time we're out of the grace period, the higher level caches might be all built, which means we don't need to get those lower level cache items anyway, so we don't write them back to the fast backend even after the grace period.

    So instead of writing the same amount of data to APCu, but 50ms later and less times, it may result in writing less data to APCu overall.

  • 🇬🇧United Kingdom catch

    I ran the following test:

    Applied this patch to core:

    diff --git a/core/lib/Drupal/Core/Cache/ApcuBackend.php b/core/lib/Drupal/Core/Cache/ApcuBackend.php
    index bf5e4d98ca9..39b154255c5 100644
    --- a/core/lib/Drupal/Core/Cache/ApcuBackend.php
    +++ b/core/lib/Drupal/Core/Cache/ApcuBackend.php
    @@ -183,6 +183,7 @@ protected function prepareItem($cache, $allow_invalid) {
        * {@inheritdoc}
        */
       public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = []) {
    +    file_put_contents('/tmp/set.txt', $cid . "\n", FILE_APPEND);
         assert(Inspector::assertAllStrings($tags), 'Cache tags must be strings.');
         $tags = array_unique($tags);
         $cache = new \stdClass();
    
    drush cr
    rm /tmp/set.txt
    

    This clears caches, but then clears out any apcu sets from the command line (e.g. via router rebuilds), so we're only seeing what happens with the first full page load.

    Then visited the front page.

    Repeated this with and without the MR.

    cat set-before.txt | wc -l
    1236
    cat set-after.txt | wc -l
    889
    

    So that's apcu sets reduced by around 1/3rd on the first page load after a drush cr.

    This makes me think we should try a longer grace period, so will have a look at that next.

  • 🇬🇧United Kingdom catch

    Tried a longer grace period and found more things.

    First of all, I realised that in set/setMultiple we always unconditionally write through to the chained fast backend. The idea here was to avoid a consistent cache lookup on the next request (especially in single server setups), but this is the cause of hundreds of writes to the fast backend, which may or may not then be used on the next request anyway - e.g. any low level cache items underneath a layered cache (views data discovery, some plugins derivers) might not be requested on the next request. So I removed those two write-throughs.

    This took 889 apcu writes down to just over 120.

    I then added the 1 second grace period, and this reduced 120 acpu writes down to 2 or 3.

    I then did a further test where I reloaded the Umami front page again after the first couple of steps, and then also clicked over to the 'recipes' page, to see what would happen with a couple more requests, and I got a total of 199 writes. I'm uploading that file - it's not an apples to apples comparison with the previous tests, but it shows that we have less writes even after more activity.

    There is probably a bit of discussion to be had about what an ideal grace period should be, but tbh 1 second seems about right, it's usually enough time for discovery caches to be rebuilt, and it's discovery caches (views data, block plugins) are the ones that can result in lots and lots of small config entity and similar cache misses. It's also short enough that the fast backend will start to take load of the consistent backend pretty quickly still - even if we do more writes, if it keeps getting invalidated, the extra fast backend traffic doesn't result in less load.

  • Pipeline finished with Failed
    8 days ago
    Total: 254s
    #504794
  • Pipeline finished with Failed
    8 days ago
    Total: 214s
    #505303
  • Pipeline finished with Failed
    8 days ago
    Total: 215s
    #505312
  • Pipeline finished with Success
    8 days ago
    Total: 337s
    #505322
  • Pipeline finished with Failed
    8 days ago
    Total: 576s
    #505382
  • 🇬🇧United Kingdom catch

    Added us

  • Pipeline finished with Failed
    8 days ago
    Total: 497s
    #505387
  • 🇬🇧United Kingdom catch
  • Pipeline finished with Failed
    8 days ago
    Total: 512s
    #505395
  • Pipeline finished with Failed
    8 days ago
    Total: 545s
    #505409
  • 🇬🇧United Kingdom catch
  • Pipeline finished with Success
    7 days ago
    Total: 548s
    #505428
  • Pipeline finished with Success
    7 days ago
    Total: 580s
    #505470
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Gave this a review, generally like the idea and it seems promising in reducing a cache stampede's impact. However, I have some questions about whether we are profiling all the pieces involved in the process.

    If I understand this correctly, we are completely eliminating fast backend lookups and writes for a whole second whenever anything is written to the persistent backend. Won't that simply move the stampede from the fast backend to the persistent backend?

    We know that ChainedBackend is used for cache items that are expensive and/or requested frequently during a request. So turning off for a whole second the very thing why people use ChainedBackend seems a bit... odd?

    If we are talking about higher and lower level caches being warmed, wouldn't it make more sense for the lower level caches to not use ChainedBackend if we're going to be storing calculations derived from the lower level data in a higher level cache? Talking about your example from #8.

    Either way, not sure about this.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    It seems to me that we are limited by the $lastWriteTimestamp being shared regardless of CID. Meaning if we write 5 different items with a small delay between them, entries one through four will be considered invalid because of the fifth item being written. Why?

    From the docs:

     * Because this backend will mark all the cache entries in a bin as out-dated
     * for each write to a bin, it is best suited to bins with fewer changes.
    

    Isn't that bit worth reconsidering instead? If I have an expensive item I know will be requested multiple times, why would I accept it not being found in the cache because another item in the same bin so happened to got written just a few ms before my retrieval attempt? Seems like keeping a last write per CID could also go a long way here.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Also if we are to go ahead with the current approach, the class docs need updating. This bit won't hold true anymore:

     * (chained) cache backend. Those cache entries that were created before the
     * last write are discarded, but we use their cache IDs to then read them from
     * the consistent (slower) cache backend instead; at the same time we update
     * the fast cache backend so that the next read will hit the faster backend
     * again. Hence we can guarantee that the cache entries we return are all
     * up-to-date, and maximally exploit the faster cache backend.
    
  • 🇬🇧United Kingdom catch

    I don't think we have chained fast for items that are requested multiple times per page, for that we have memory/static caching.

    The reason is that we have a couple of dozen cache items that are requested on almost any request, especially state, block config for the current theme etc. and we can offload all of those from the database (or redis) for the majority of requests.

    Last write per cid would mean as many persistent cache gets as there are fast backend gets and undermine the database offload entirely.

    One thing I wondered about was whether we could have the fast backend store a random key to identify the local cache, and add that key to items in the persistent backend, and then only use last_write_timestamp when it doesn't match. This would help on single web head setups a lot, and maybe a bit with more web heads too. It would mean stuffing it into the persistent item transparently (like nesting the actual data in an array with the hash.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Yeah for a second there I confused ChainedFastBackend with BackendChain where we combine a memory and database cache. That is often what we use for items that are requested multiple times per page. Okay let me revisit what I wrote and see which bits still make sense.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Okay, after resolving the mix-up in my brain between ChainedFastBackend and BackendChain I actually really like this MR now. I still have some notes about spelling and docs in general, but the code looks good to me.

    On a cold start we don't want several processes trying to write the same thing to APCu (or other software) thousands of times, so the 50ms (now 1s) locking mechanism prevents that. We still are able to retrieve the calculated thing from the DB cache, so it's not that bad. Over time, we will populate the fast backend properly and then we get the full benefit of it. But at least we won't be choking it when on cold caches.

    The only thing I could still see is that both the 50ms and 1s window seem arbitrary. But something is definitely better than nothing here, so I suppose that's fine.

    Other than the comments on the docs, I don't have any objections here. Seems like a great addition.

    One thing I wondered about was whether we could have the fast backend store a random key to identify the local cache, and add that key to items in the persistent backend, and then only use last_write_timestamp when it doesn't match. This would help on single web head setups a lot, and maybe a bit with more web heads too. It would mean stuffing it into the persistent item transparently (like nesting the actual data in an array with the hash.

    So each item in the persistent cache would be flagged with a key like so: ['some_key' => 'abcdef...', 'data' => $the_actual_cache_data] and upon retrieval we convert that back to $the_actual_cache_data?

    But to what purpose? I'm not sure I can follow what you mean with "to identify the local cache" here.

  • Pipeline finished with Canceled
    5 days ago
    Total: 338s
    #506946
  • Pipeline finished with Canceled
    5 days ago
    Total: 82s
    #506949
  • Pipeline finished with Success
    5 days ago
    Total: 564s
    #506950
Production build 0.71.5 2024