Optimize redirect chain retrieval in VariationCache

Issue created by @kristiaanvandeneynde
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
kristiaanvandeneynde → credited berdir → .
Comment 5 months ago →
🇬🇧United Kingdom catch
kristiaanvandeneynde → credited catch → .
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Merge request !11463Update variation cache so chain lookups are stored in memory until the next... → (Closed) created by kristiaanvandeneynde
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Let's see what testbot has to say.
Pipeline finished with Failed
5 months ago
Total: 94s
#447490
Pipeline finished with Failed
5 months ago
Total: 89s
#447582
Pipeline finished with Failed
5 months ago
Total: 351s
#447599
Pipeline finished with Failed
5 months ago
Total: 322s
#447627
Comment 5 months ago →
🇬🇧United Kingdom catch
The results here look great. The only possible change I would make would be instead of validating that cache contexts haven't changed, maybe keying the cache IDs by the actual request object using SplObjectStorage - that ought to allow the logic to still work if there are multiple subrequests then.
Comment 5 months ago →
🇬🇧United Kingdom catch
#7 isn't enough because we can't guarantee that cache context changing things will correctly create/pop requests on/off the request stack, so let's just do the validation for now, and it'll be on the very long list of things to solve to make async rendering in frankenphp work when we get there - but a possible risk of cache misses, not incorrect rendering.
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
I pushed a commit to try and support getMultiple() as well. It's a bit tricky due to the loops and behavior of getMultiple() I thought I gotten it working, but not yet seeing a change in StandardPerformanceTest.
Pipeline finished with Failed
5 months ago
Total: 120s
#447846
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
Note: By not affecting tests, I meant when combined with the other two issues. On it's own it probably won't do much yet, as we're not yet pushing enough things into CachedStrategy.
Pipeline finished with Failed
5 months ago
Total: 343s
#447852
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
This might be doing too much. It's now a full static cache of all variation cache lookups. That will keep those things in memory and it's also causing _tons_ of test fails, for example because it's not checking cache tags.

My idea was to restrict this to lookups that end in a cache miss, either immediately or after redirects, Then there are fewer problems with invalidations.
Pipeline finished with Failed
5 months ago
Total: 225s
#447930
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
LOL, @berdir beat me to it. I lost sleep last night thinking we're caching too much on cache HITS because we know redirects are permanent but the end of the chain on a hit is not.

So either we switch to a MemoryCache for the redirect chain, which would turn this whole thing into a no-op and a glorified chained backend or we just make sure we pop a cache HIT off the chain and store only the parts leading up to it.
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Refresh status change...
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
I think we can do one better in getRedirectChain(). Rather than not setting a cache on a cache HIT, we can still store the path leading up to the hit and reuse that next time we construct a redirect chain. At least that way we can skip requesting all the redirects again.
Pipeline finished with Failed
5 months ago
Total: 651s
#448198
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Updated the code, we still get good results on performance tests. Just 3 cache gets more than the initial improvement.

I also temporarily reverted the getMultiple() changes so we can first see which tests fail on the single changes. Easier to debug that way.
Pipeline finished with Failed
5 months ago
Total: 510s
#448219
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
That's looking a lot better already. I suggest we fix these test fails and then introduce the same logic in getMultiple().
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
I think NodeTest might be failing for the same reasons we need refreshVariablesTrait. One request has a MISS, the other makes it so something is cached, but the cache in the test's thread still thinks there's a MISS and returns as such.
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Yeah I got a fix, but got something urgent to attend to.
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Pushed some fixes already, you can see the pattern is exactly like how we use refreshVariables()
Pipeline finished with Failed
5 months ago
Total: 153s
#448413
Pipeline finished with Failed
5 months ago
Total: 629s
#448449
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Fixed final test fail locally and the other went green locally. So if all goes green we can focus on reintroducing the static cache in getMultiple().
Pipeline finished with Success
5 months ago
Total: 901s
#448492
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Okay, all green :) Now for the final hurdle...
Pipeline finished with Success
5 months ago
Total: 829s
#449522
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
I think I've got a clean solution for ::getMultiple() but it'll have to wait a few more days. Sickness got the best of me, tried to commit today but my head is not functioning and I can't guarantee that whatever I commit today is airtight.
Pipeline finished with Success
5 months ago
Total: 5483s
#451037
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Felt useless while sick on the couch, added back support for ::getMultiple(). Back to rest mode now.
Pipeline finished with Failed
5 months ago
Total: 127s
#452262
Pipeline finished with Failed
5 months ago
Total: 148s
#452269
Pipeline finished with Failed
5 months ago
Total: 162s
#452276
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Not sure why it's failing on QuickStartTest and RecipeQuickStartTest
Pipeline finished with Failed
5 months ago
Total: 475s
#452294
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
I expect the gains from getMultiple() will only become visible once ✨ Optimize placeholder retrieval from cache Active is done, which itself is soft-blocked on 📌 Try to preload cache tags in cache getMultiple Active to avoid ending up with more cache tag lookups in some test scenarios. Lets focus on that order?
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Currently it shows 2 fewer cache gets in 'render' after the getMultiple support.

Re #25, I'm fine either way we commit this. I consider this issue "done" now and the results are amazing. Also they're actual DB gets, not memory gets, so the impact is even bigger.

I'm also considering creating an issue where can differentiate between DB and memory gets.
Pipeline finished with Failed
5 months ago
Total: 419s
#452324
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
agreed about this being done, but tests will either way heavily conflict with ✨ Optimize placeholder retrieval from cache Active , no matter which gets in first. could in theory also be this one I suppose.

If you look at that, it also results in major reduction of cache get lookups. Right now, getMultiple() support is by far not as effective as it could be without that change because most single placeholders are still first requested with a single cache lookup and fill up the static cache one-by-one. By combining the two issues, all those placeholders will only be a single getMultiple(), resulting in a larger improvement than either issue can achieve on its own.

to me it makes sense to do the other issue first, it seems more logical to optimize that first, but in practice the result is the same, after one of those two issues in, the other one will need a major rebase and rerun those tests to get the new combined numbers.
Comment 5 months ago →
needs-review-queue-bot
The Needs Review Queue Bot → tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide → to find step-by-step guides for working with issues.
Comment 5 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Setting back to NR and opting out of bot to reduce noise.

@berdir, I don't really mind resolving merge conflicts like these, PhpStorm has a nice tool for it. So whichever issue goes in first, I'm not too fussed about having to resolve a merge conflict on the others. Can ping me to do it if you want.
Pipeline finished with Failed
5 months ago
Total: 640s
#454007
Comment 5 months ago →
🇨🇭Switzerland berdir Switzerland
It's not the process of rebasing, that's fine, although I'm confused by the phpstorm UI and prefer to just edit manually.

In this case, both issues shave off 10+ cache lookups and when combined, it will improve further, but not just the sum of both, so we'll need to rerun the performance tests to get the numbers. But that's necessary in either direction of them getting in.

It's really just more logical for me to do the other issue first, it's a smaller, less complex change, it happens earlier in the page flow and getMultiple() will be much more involved then (as in, more lookups at once), making us more confident that the change is correct.

But if this happens to get RTBC and committed first then perfectly fine too.

(mostly just adding the parent issue)
Comment 4 months ago →
🇬🇧United Kingdom catch
Needs a rebase.
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Done
Pipeline finished with Failed
4 months ago
Total: 1666s
#461759
Comment 4 months ago →
🇬🇧United Kingdom catch
The new JSON:API performance test is failing but in a good way.
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Just noticed, fixed it.
Pipeline finished with Success
4 months ago
Total: 1769s
#461776

🇨🇭Switzerland berdir Switzerland

I had a closer look at what exactly those 24 cache gets for the NodePage cache are. The first 10 are rendered nodes and medias, that makes sense.

But then it gets interesting, I get 3x the umami_search block:

  [10]=>
  string(162) "entity_view:block:umami_search:[languages:language_interface]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [11]=>
  string(190) "entity_view:block:umami_search:[languages:language_interface]=en:[languages:language_url]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [12]=>
  string(162) "entity_view:block:umami_search:[languages:language_interface]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [13]=>
  string(190) "entity_view:block:umami_search:[languages:language_interface]=en:[languages:language_url]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [14]=>
  string(162) "entity_view:block:umami_search:[languages:language_interface]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [15]=>
  string(190) "entity_view:block:umami_search:[languages:language_interface]=en:[languages:language_url]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"

And then 2x umami_branding:

  [16]=>
  string(164) "entity_view:block:umami_branding:[languages:language_interface]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [17]=>
  string(192) "entity_view:block:umami_branding:[languages:language_interface]=en:[languages:language_url]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [18]=>
  string(164) "entity_view:block:umami_branding:[languages:language_interface]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"
  [19]=>
  string(192) "entity_view:block:umami_branding:[languages:language_interface]=en:[languages:language_url]=en:[theme]=umami:[user.permissions]=77981d129fc903b842d6e469f96b1692c3ac47e0369f7b7876856518ae7131b3"

The reason these multiple calls happen is because of core/profiles/demo_umami/themes/umami/templates/layout/page.html.twig, specifically this block:

  {% if page.pre_header|render|trim is not empty or
     page.header|render|trim is not empty %}
    <header class="layout-header" role="banner">
      <div class="container">
        {% if page.pre_header|render|trim is not empty %}
          {{ page.pre_header }}
        {% endif %}
        {% if page.header|render|trim is not empty %}
          {{ page.header }}
        {% endif %}
      </div>
    </header>
  {% endif %}

So pre_header is being rendered a total of 3 times. Not sure why the template bothers at all with that, it's a demo, it's not really meant to be customized. But it could store those things in variables. But that's a separate issue that I'll create.

However, I'm also not seeing the static cache kick in for those redirects, that would save another 2 lookups for umami_search at least for now. The reason for this is that redirectChainIsValid checks that the resulting cache id for the redirect chain is in $chain. But because we don't persist the cache hit in the $chain, this will never be TRUE for any lookups that were a HIT, it only works for misses.

I think that's in scope to be fixed here. Not exactly sure what's the best approach. Maybe we only need to validate entries that have a FALSE at the end of the chain? or specifically skip the last item if it's a a redirect?

Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
But because we don't persist the cache hit in the $chain, this will never be TRUE for any lookups that were a HIT, it only works for misses.

That was by design, though, and discussed previously. We cannot persist the cache HIT as it may get invalidated and our redirect chain cache would have no clue that it did. Tests were even failing because of that because at first I did store cache hits in the cached redirect chain.
Comment 4 months ago →
🇨🇭Switzerland berdir Switzerland
I know. I'm not saying that's the problem. The problem is that redirectChainIsValid() incorrectly still assumes that we do store a full chain including hits in $chain and it's validating it under that assumption. As a result, it's silently failing those validations and _not_ using them in case of cache hits.

So either we make it less strict for those cases and not validate the last item if that's a redirect, or we stop pretending that we support that and _only_ store chains that end in a miss. This would have the same result with the current performance tests right now. Usually you'd expect that with the optimizations that we now have in place, once we have a cache hit, we no longer need the chain as we won't need to look it up again. Except the umami tests show that with the current twig approach (which is quite common), we do in fact render the same cacheaable element multiple times on a single page, so being able to at lest keep the redirects is useful in that scenario.
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Oh, right. I will amend that, good catch!
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Pushing something wild.

Rather than a simple "is the entire chain still valid", I'm swapping it out with "give me the part of the chain that is still valid". This should fix the false invalid result on chains that led to a cache HIT before, but also immediately optimize scenario's where an entire chain was deemed invalid if cache contexts changed value mid-request, even though we may have been able to reuse the start of the chain.
Pipeline finished with Failed
4 months ago
Total: 147s
#462330
Pipeline finished with Failed
4 months ago
#462348
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
That reduced the cache GETs by 3, probably fixing what you found in #35
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Just wanted to add for posterity that this is all a slightly convoluted alternative to making the render cache a chained backend with a memory cache in front. However, that would have several drawbacks:

We have to use MemoryBackend rather than MemoryCache or otherwise the render cache would no longer be a true cache. MemoryBackend is slow due to all the (un)serializing

We might not see any benefit in our performance tests, or even worse performance, because we don't differentiate between DB gets and memory gets. Maybe we should?

It would needlessly check all cache redirects for invalidation, even though they don't need to be invalidated. For now... some might if we land PseudoCacheContext from 🐛 Cache redirect error displayed when using 'view own unpublished content' permission alongside node grants Active

So all in all, given how crucial the render VariationCache is to a page's performance, I think it's a good thing we came up with a custom solution here.
Pipeline finished with Success
4 months ago
Total: 1906s
#462360
Comment 4 months ago →
🇬🇧United Kingdom catch
, because we don't differentiate between DB gets and memory gets. Maybe we should?

We initially counted all database queries in performance tests, but this caused random test failures with the chained fast backend because it uses timestamps for whether an item is valid, so millisecond rounding issues can result in an item being valid or not. For me personally I think the current split by cache bins is good - we know implicitly which bins use chained fast or not.

The new approach looks really good to me and the comments explain what's going on well.
Comment 4 months ago →
🇨🇭Switzerland berdir Switzerland
Yes, 3 fewer is exactly what I'd expect, 2 redirects for the search block and one for branding that we cache statically. I think trying to statically cache the cache hits as well is not worth the complexity. It's an exception that you really render the same cacheable render array multiple times on the page, like when doing that kind of twig customization as umami does.

I opened 🐛 Umami page template renders header regions multiple times Active for that, I also have some vague ideas on how to improve that in general, but that could be very challenging with BC (the problem is that render arrays aren't passed around by reference to twig commands, so we don't benefit from the fact that the renderer stores the built thing in #markup/#children).

The wording of some comments seems a bit unusual ("You can read up..", usually we just use a neutral "see xy" but there are a handful of existing "You can ..." in docs already) and slightly repetitive (the array keys are kind of explained twice for $redirectChainCache). But I'm not a native speaker and this is complex stuff, it doesn't hurt to be a bit verbose I think.

Implementation looks good now to me.
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
We initially counted all database queries in performance tests, but this caused random test failures with the chained fast backend because it uses timestamps for whether an item is valid, so millisecond rounding issues can result in an item being valid or not. For me personally I think the current split by cache bins is good - we know implicitly which bins use chained fast or not.

Right, almost forgot about how annoying we were to everyone else with our performance tests 😅 Agreed that we can leave this as is.

I'll see if I can comb over the MR one more time to find places to improve the comments. VariationCache has become a lot more complex with the addition of getMultiple() and the internal cache now, so maybe having this many comments isn't a bad thing?
Comment 4 months ago →
🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
I just had another look and while I agree that part of ::getValidatedCachedRedirectChain() rehashes what we already commented elsewhere, I don't think it hurts here to be so verbose. VariationCache is becoming quite complex and the comments may help understand it.

As for the "you can" vs "See xy", I'll leave that up to committers to split hairs over :) I'm also not a native speaker but I try to do my best to write proper English in my comments, yet might fail sometimes.

if it's all the same to you, I'll leave the MR as-is now.

Comment 4 months ago →

System Message

catch → committed a0a7640c on 11.x

Issue #3512766 by kristiaanvandeneynde, berdir, catch: Optimize redirect...

Comment 4 months ago →
🇬🇧United Kingdom catch
I made the 'you can' more passive on commit, but kept the 'we can' which for me is different (e.g. it's about what the code and the people working on the code can do, not the person reading the comment and a common formulation, 'we can see' etc.). Agreed generally that verbosity and potential duplication is a virtue for this one.

Committed/pushed to 11.x, thanks!

Tagged for 11.2.0 release highlights, not because we should try to explain exactly what we did here, but it would be good to include in the general 'render caching improvements' section the kinds of things we optimized.
Comment 4 months ago →
System Message
catch → closed merge request !11463
Pipeline finished with Success
4 months ago
Total: 490s
#462467
Comment 4 months ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Optimize redirect chain retrieval in VariationCache

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Merge Requests

!11463Optimize redirect chain retrieval in VariationCache
Closed

Comments & Activities

Optimize redirect chain retrieval in VariationCache

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Merge Requests

!11463Optimize redirect chain retrieval in VariationCacheClosed

Comments & Activities

!11463Optimize redirect chain retrieval in VariationCache
Closed