- Issue created by @mglaman
- Merge request !12883Statically cache metadata per user for PermissionsHashGenerator β (Open) created by mglaman
- πΊπΈUnited States smustgrave
Possible expand PermissionsHashGeneratorTest to include coverage for this change?
- π¨πSwitzerland berdir Switzerland
\Drupal\Core\Session\AccessPolicyProcessor::processAccessPolicies() uses a static variation cache and core by default no longer uses a persistent cache on that. This shouldn't hit redis at all in that scenario. But yes, doing a quick test myself and I can definitely reproduce these numbers with blackfire. Frontpage for me with disabled render cache is 130 calls and that makes up 7% of the total time. But also, this is only 50% of the calls to processAccessPolicies(), the other ~50 are from PermissionChecker::hasPermission(). so processAccessPolicies in total is \Drupal\Core\Session\AccessPolicyProcessor::processAccessPolicies().
This involves many many function calls and I assume that blackfire has a huge overhead here, but still, I've definitely noticed \Drupal\Core\Session\AccessPolicyProcessor::processAccessPolicies() myself before. As I've mentioned in the redirect issue, the flexibility we gained (which core by default doesn't use) comes at a high price. VariationCache is very slow and it's useful when you actually need to worry about variations, but 90%+ of sites do not. I wonder if we can add one more check like \Drupal\Core\Session\AccessPolicyProcessor::needsPersistentCache() and somehow avoid the variationCache here when we don't expect anything to actually *vary*.
All that said, this is a fairly obvious win and agreed that it's inconsistent that one method does it and the other doesn't. Would it maybe make more sense to cache the processAccessPolicies() call if we decide it's worth it? createFromObject() should be fast and then we actually reduce 2 calls down further to just one?
- π§πͺBelgium kristiaanvandeneynde Antwerp, Belgium
As to the IS and the proposed solution: I agree that if we use a static cache for the hash, we could just as well use one for the cacheable metadata as both rely on the same source. So if we were to go with that approach, consider me on board.
What @berdir wrote in #5 is also true.
A few things I slightly disagree with, though:
As I've mentioned in the redirect issue, the flexibility we gained (which core by default doesn't use) comes at a high price.
There were several things in core that were truly broken with the old redirect system because it combined all possible variations on the second level. This meant you could not use highly variable cache contexts on common page elements such as local tasks because it would make every page uncacheable. With VariationCache, this is no longer an issue.
VariationCache is very slow
Performance tests at the time showed that VariationCache wasn't necessarily slower than the old system either. Depending on the scenario it was even faster.
I wonder if we can add one more check like \Drupal\Core\Session\AccessPolicyProcessor::needsPersistentCache() and somehow avoid the variationCache here when we don't expect anything to actually *vary*.
Worth exploring, but it speaks for itself that we need to tread very carefully there. If we make a mistake in VC while trying to save resources, we might introduce all sorts of security issues.
Would it maybe make more sense to cache the processAccessPolicies() call if we decide it's worth it? createFromObject() should be fast and then we actually reduce 2 calls down further to just one?
The access policy processor has to check at least two static caches to return your access policies: A regular one for cache contexts and a variation cache for the actual result. So if we can turn that into one that would indeed be a win. The question becomes how fast the hashing is vs the static variation cache lookup fewer.
But at that point we'd be putting a regular static cache in front of the caches in AccessPolicyProcessor just to bypass the static variation cache lookup in APP. If we're going to go ahead and do that, we may as well do the same in the PermissionChecker and further reduce the amount of calls to APP. As Berdir's findings shows that PermissionChecker is the other major driver of traffic towards APP.
This does mean that we'd be introducing an extra regular static cache lookup we know will miss for every first permission check on a request.
- π¨πSwitzerland berdir Switzerland
The "redirect issue" is about β¨ Check if site is in maintenance mode before checking if user has access to maintenance mode Active , a redirect.module issue that switched a hasPermission() check to account for the slower logic behind that, it is not about variation cache redirects. That was only clear to me and mglaman who created that issue, should have mentioned that.
Performance tests at the time showed that VariationCache wasn't necessarily slower than the old system either. Depending on the scenario it was even faster.
Similarly, this isn't about old and new VariationCache implementation. It's about now actually using it for the static cache where previously we had nothing like that. VariationCache performance was never great, but it's less of an issue when it's used for a persistent cache that's much slower and less frequently used. 150 calls and 15% of total execution time is massive and imho a major performance issue (with the caveat that it's maybe only 10% or even 5% without blackfire, very hard to tell, but it's definitely something)
(VariationCache now may very well be faster now than it was before, especially around multiple lookups that we heavily optimized in 11.2, but that's not the scope here)
We discussed this quite a bit in https://drupal.slack.com/archives/C1BMUQ9U6/p1756840947127769, and the conclusion is essentially that using a simple static cache here is a possible source for access issues if the policies can vary for the same using during the same request in a way that is not covered by cache tags (e.g. user or role save). Given that this is a possibility, we shouldn't have non-variation static caching on top of a variation static cache. Of course just removing that would make this issue even worse (and add 50% more calls to processAccessPolicies(), so we can realistically only do that if we also come up with a massive performance improvement for that.
I wasn't proposing to just remove the static variation cache. I was thinking around one of two options:
1. Improve the performance of VariationCache in general, specifically the cache id calculation. Not sure how.
2. Figure out a rule on when we can skip the variation cache in favor of a simpler per-user static cache. Essentially something like \Drupal\Core\Session\AccessPolicyProcessor::needsPersistentCache() but for variation caching. We'd ask access policies whether they are free from outside/dynamic influences. Given the same user, it will return the same user. SuperUserPolicy is very obvious, UserRolesAccessPolicy as long as the user or roles don't change, which is mostly covered by cache tags.Stepping through the implementation a bit, I have a new idea as well. Here's the thing: access policy caching was optimized for persistent caching (The method is called ::get*Persistent*CacheContexts()) and the goal is to solely rely on cache contexts to and make no assumptions, with the idea that we can by default for example group access cache for all users with the same group. . But by default (with just core), we don't even do persistent caching anymore. For the static cache, I think that's the wrong goal. Multiple users in the same request is an edge case, we shouldn't optimize for the edge case.
So if we add the uid to the cache keys, at least for static cache, then we can drop all user.xy cache contexts. Only problem I can think of right now is that you could alter roles on a user entity without saving that user (saving we could handle with user:ID cache tag), but I don't think that's something we need to support?
That still leaves us with cache redirects. Because the other reason that this is so slow is having to do all of this multiple times to resolve redirects. On multilingual sites, you always end up with languages:language_interface. which is irrelevant for permissions/access, but the user label has possibly been translated. That's the risk of addCacheableDependency(), you often get stuff that you don't need and it's tricky to know what you really need. we should either switch to ->addCacheTags($user_role->getCacheTags()) in UserRolesAccessPolicy or explicitly filter that out. That might be safer, in theory you could have something like per-domain user permission overrides, although thinking about that hurts my brain.
I'll try to do some tests around that and see what impact that has.
- π§πͺBelgium kristiaanvandeneynde Antwerp, Belgium
Need more time to respond properly, but just to clarify ::get*Persistent*CacheContexts() has nothing to do with the type of cache. It means that these cache contexts are always there for that access policy, with more cache contexts optionally being added during its calculatePermissions() or alterPermissions() methods.
It's also often referred to as initial cache contexts, but that implies you need to add more, so I/we chose getPersistentCacheContexts as a name.
- π¨πSwitzerland berdir Switzerland
re persistent cache tags: fair enough, doesn't really change what I wrote, it's still optimized to avoid per-user caches, which is not something we care about with the static cache.
Experimenting with my ideas. Filtering out the language results in this:
Code is ugly, but that's already a big improvement.
- π¨πSwitzerland berdir Switzerland
Combined with the user context/key stuff, it results in a 75% improvement.
And a good chunk of that is actually assert() code, because I've been running this with assert on.
It's fairly ugly. We could simplify it by doing the same to the persistent variation cache, but it could possibly result in a lot of variations (users * groups or something like that)
Checking PermissionsHashGenerator and removing the static cache there adds a bunch of calls, but it's still way faster overall than HEAD. Didn't add that yet.
- π¨πSwitzerland berdir Switzerland
Bunch of kernel test fails, only checked one of them, \Drupal\KernelTests\Core\Entity\ValidReferenceConstraintValidatorTest::testPreExistingItemsValidation for example fails because it creates two users with different roles but doesn't save them. I think we can assume that accounts actually have an ID?
Will wait with further work for some feedback.