Add a grace period to ChainedFastBackend when last_write_timestamp is ahead

Issue created by @catch
Comment 2 months ago →
🇬🇧United Kingdom catch
Pushed what I think we need.

Didn't implement the 'additional grace period' because I'm not sure that's what we want. If there are lots and lots of cache sets/invalidations happening, then the $last_write_timestamp will keep being set ahead, and as soon as it's not, that's when we want to write to the fast backend, so what's here might be enough.
Merge request !12203Ignore the fast backend when the last_write_timestamp is in the future. → (Open) created by catch
Comment 2 months ago →
🇬🇧United Kingdom catch
Moving to needs review.

Adding needs profiling because I think that running xhprof on a page request immediately after a cache clear should hopefully show a clear before/after here - e.g. there should be dozens or hundreds less apcu reads and writes but almost no other changes. And then after refreshing that page a couple of times, everything should be the same with/without the MR applied.
Pipeline finished with Failed
2 months ago
Total: 715s
#504000
Comment 2 months ago →
🇬🇧United Kingdom catch
Pipeline finished with Failed
2 months ago
Total: 183s
#504018
Comment 2 months ago →
🇩🇪Germany Fabianx
Looks great to me!
Comment 2 months ago →
🇬🇧United Kingdom catch
Removing the needs profiling tag because by happy accident this made existing unit tests fail, which gave me an idea how we should unit test this - unit test proves that this skips the fast backend when the timestamp is in the future.

However it would probably still be nice to profile it to actually show the real difference in apcu reads/writes.
Pipeline finished with Success
2 months ago
Total: 440s
#504039
Comment 2 months ago →
🇬🇧United Kingdom catch
There's another potential benefit here although it will depend on what actually happens in real-life situations.

Field config entities (for example) aren't usually requested at all when caches are warm, they're used to build things like the views data cache, which then gets cached separately. See 🐛 ViewsData should not cache by language Needs work and 📌 Allow fields to be opted out of views data Active for some background.

If those higher level caches are built and written during the 'grace period' added here, we won't write field config entities to the fast backend during that time, and then by the time we're out of the grace period, the higher level caches might be all built, which means we don't need to get those lower level cache items anyway, so we don't write them back to the fast backend even after the grace period.

So instead of writing the same amount of data to APCu, but 50ms later and less times, it may result in writing less data to APCu overall.

🇬🇧United Kingdom catch

I ran the following test:

Applied this patch to core:

diff --git a/core/lib/Drupal/Core/Cache/ApcuBackend.php b/core/lib/Drupal/Core/Cache/ApcuBackend.php
index bf5e4d98ca9..39b154255c5 100644
--- a/core/lib/Drupal/Core/Cache/ApcuBackend.php
+++ b/core/lib/Drupal/Core/Cache/ApcuBackend.php
@@ -183,6 +183,7 @@ protected function prepareItem($cache, $allow_invalid) {
    * {@inheritdoc}
    */
   public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = []) {
+    file_put_contents('/tmp/set.txt', $cid . "\n", FILE_APPEND);
     assert(Inspector::assertAllStrings($tags), 'Cache tags must be strings.');
     $tags = array_unique($tags);
     $cache = new \stdClass();

drush cr
rm /tmp/set.txt

This clears caches, but then clears out any apcu sets from the command line (e.g. via router rebuilds), so we're only seeing what happens with the first full page load.

Then visited the front page.

Repeated this with and without the MR.

cat set-before.txt | wc -l
1236
cat set-after.txt | wc -l
889

So that's apcu sets reduced by around 1/3rd on the first page load after a drush cr.

This makes me think we should try a longer grace period, so will have a look at that next.

Comment 2 months ago →
🇬🇧United Kingdom catch
Tried a longer grace period and found more things.

First of all, I realised that in set/setMultiple we always unconditionally write through to the chained fast backend. The idea here was to avoid a consistent cache lookup on the next request (especially in single server setups), but this is the cause of hundreds of writes to the fast backend, which may or may not then be used on the next request anyway - e.g. any low level cache items underneath a layered cache (views data discovery, some plugins derivers) might not be requested on the next request. So I removed those two write-throughs.

This took 889 apcu writes down to just over 120.

I then added the 1 second grace period, and this reduced 120 acpu writes down to 2 or 3.

I then did a further test where I reloaded the Umami front page again after the first couple of steps, and then also clicked over to the 'recipes' page, to see what would happen with a couple more requests, and I got a total of 199 writes. I'm uploading that file - it's not an apples to apples comparison with the previous tests, but it shows that we have less writes even after more activity.

There is probably a bit of discussion to be had about what an ideal grace period should be, but tbh 1 second seems about right, it's usually enough time for discovery caches to be rebuilt, and it's discovery caches (views data, block plugins) are the ones that can result in lots and lots of small config entity and similar cache misses. It's also short enough that the fast backend will start to take load of the consistent backend pretty quickly still - even if we do more writes, if it keeps getting invalidated, the extra fast backend traffic doesn't result in less load.
Pipeline finished with Failed
2 months ago
Total: 254s
#504794
Pipeline finished with Failed
about 2 months ago
Total: 214s
#505303
Pipeline finished with Failed
about 2 months ago
Total: 215s
#505312
Pipeline finished with Success
about 2 months ago
Total: 337s
#505322
Pipeline finished with Failed
about 2 months ago
Total: 576s
#505382
Comment about 2 months ago →
🇬🇧United Kingdom catch
Added us
Pipeline finished with Failed
about 2 months ago
Total: 497s
#505387
Comment about 2 months ago →
🇬🇧United Kingdom catch
Pipeline finished with Failed
about 2 months ago
Total: 512s
#505395
Pipeline finished with Failed
about 2 months ago
Total: 545s
#505409
Comment about 2 months ago →
🇬🇧United Kingdom catch
Pipeline finished with Success
about 2 months ago
Total: 548s
#505428
Pipeline finished with Success
about 2 months ago
Total: 580s
#505470
Comment about 2 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Gave this a review, generally like the idea and it seems promising in reducing a cache stampede's impact. However, I have some questions about whether we are profiling all the pieces involved in the process.

If I understand this correctly, we are completely eliminating fast backend lookups and writes for a whole second whenever anything is written to the persistent backend. Won't that simply move the stampede from the fast backend to the persistent backend?

We know that ChainedBackend is used for cache items that are expensive and/or requested frequently during a request. So turning off for a whole second the very thing why people use ChainedBackend seems a bit... odd?

If we are talking about higher and lower level caches being warmed, wouldn't it make more sense for the lower level caches to not use ChainedBackend if we're going to be storing calculations derived from the lower level data in a higher level cache? Talking about your example from #8.

Either way, not sure about this.
Comment about 2 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
It seems to me that we are limited by the $lastWriteTimestamp being shared regardless of CID. Meaning if we write 5 different items with a small delay between them, entries one through four will be considered invalid because of the fifth item being written. Why?

From the docs:

* Because this backend will mark all the cache entries in a bin as out-dated * for each write to a bin, it is best suited to bins with fewer changes.
Isn't that bit worth reconsidering instead? If I have an expensive item I know will be requested multiple times, why would I accept it not being found in the cache because another item in the same bin so happened to got written just a few ms before my retrieval attempt? Seems like keeping a last write per CID could also go a long way here.

Comment about 2 months ago →

🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

Also if we are to go ahead with the current approach, the class docs need updating. This bit won't hold true anymore:

 * (chained) cache backend. Those cache entries that were created before the
 * last write are discarded, but we use their cache IDs to then read them from
 * the consistent (slower) cache backend instead; at the same time we update
 * the fast cache backend so that the next read will hit the faster backend
 * again. Hence we can guarantee that the cache entries we return are all
 * up-to-date, and maximally exploit the faster cache backend.

Comment about 2 months ago →
🇬🇧United Kingdom catch
I don't think we have chained fast for items that are requested multiple times per page, for that we have memory/static caching.

The reason is that we have a couple of dozen cache items that are requested on almost any request, especially state, block config for the current theme etc. and we can offload all of those from the database (or redis) for the majority of requests.

Last write per cid would mean as many persistent cache gets as there are fast backend gets and undermine the database offload entirely.

One thing I wondered about was whether we could have the fast backend store a random key to identify the local cache, and add that key to items in the persistent backend, and then only use last_write_timestamp when it doesn't match. This would help on single web head setups a lot, and maybe a bit with more web heads too. It would mean stuffing it into the persistent item transparently (like nesting the actual data in an array with the hash.
Comment about 2 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Yeah for a second there I confused ChainedFastBackend with BackendChain where we combine a memory and database cache. That is often what we use for items that are requested multiple times per page. Okay let me revisit what I wrote and see which bits still make sense.
Comment about 2 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Okay, after resolving the mix-up in my brain between ChainedFastBackend and BackendChain I actually really like this MR now. I still have some notes about spelling and docs in general, but the code looks good to me.

On a cold start we don't want several processes trying to write the same thing to APCu (or other software) thousands of times, so the 50ms (now 1s) locking mechanism prevents that. We still are able to retrieve the calculated thing from the DB cache, so it's not that bad. Over time, we will populate the fast backend properly and then we get the full benefit of it. But at least we won't be choking it when on cold caches.

The only thing I could still see is that both the 50ms and 1s window seem arbitrary. But something is definitely better than nothing here, so I suppose that's fine.

Other than the comments on the docs, I don't have any objections here. Seems like a great addition.

One thing I wondered about was whether we could have the fast backend store a random key to identify the local cache, and add that key to items in the persistent backend, and then only use last_write_timestamp when it doesn't match. This would help on single web head setups a lot, and maybe a bit with more web heads too. It would mean stuffing it into the persistent item transparently (like nesting the actual data in an array with the hash.

So each item in the persistent cache would be flagged with a key like so: ['some_key' => 'abcdef...', 'data' => $the_actual_cache_data] and upon retrieval we convert that back to $the_actual_cache_data?

But to what purpose? I'm not sure I can follow what you mean with "to identify the local cache" here.
Pipeline finished with Canceled
about 2 months ago
Total: 338s
#506946
Pipeline finished with Canceled
about 2 months ago
Total: 82s
#506949
Pipeline finished with Success
about 2 months ago
Total: 564s
#506950
Comment about 1 month ago →
asmithtech
Reducing write contention in ChainedFastBackend is definitely important, especially for sites with high traffic where frequent cache writes can cause performance issues. One possible solution is to introduce batching or serialize the write operations before they hit the fast backend. This would help reduce the number of simultaneous writes and improve overall speed.

We faced a similar issue in one of our MERN stack projects where optimizing backend performance was key. Implementing a more efficient caching strategy made a noticeable difference. You can check out how our team handles such backend optimization challenges here: https://www.hiddenbrains.com/hire-mern-stack-developers.html

Would be happy to see this improvement tested further.

Add a grace period to ChainedFastBackend when last_write_timestamp is ahead

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Merge Requests

!12203Add a grace period to ChainedFastBackend when last_write_timestamp is ahead
Open

Comments & Activities

Add a grace period to ChainedFastBackend when last_write_timestamp is ahead

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Merge Requests

!12203Add a grace period to ChainedFastBackend when last_write_timestamp is aheadOpen

Comments & Activities

!12203Add a grace period to ChainedFastBackend when last_write_timestamp is ahead
Open