Review cache bin and cache tags of access policy caching

Created on 18 January 2025, 2 months ago

Problem/Motivation

This is about two caching things in AccessPolicyProcessor.

See #3436146-9: Introduce a list of "common cache tags" to reduce lookup query amount β†’ . The access policy includes dynamic cache tags for user roles very early in bootstrap.

Those can't be easily preloaded and preloading them all requires more complex preloading logic.

The question is whether it's worth to add cache tags for specific roles there. Roles typically don't change often, it might be worth to just use the list cache tag. This is relatively minor and might actually mostly go away with the ideas below for just core at least.

The second part is that it uses a separate, new cache bin just for itself. For the database backend, this doesn't make a big difference.

But for redis/memcache, each unique bin is a performance cost, because redis/memcache have no means to delete/invalidate across a whole bin. Redis maintains both a separate last-delete-all timestamp per bin and a cache tag for invalidateAll (See πŸ“Œ Deprecate CacheBackendInterface::invalidateAll() Active and πŸ› Allow to remove support for invalidateAll() and treat it as deleteAll() Active .

Right now, the current access policy cache implementation therefore results in 5 operations based on the monitor output from πŸ“Œ Optimize bin cache tags and last write timetamp Active :

"HGETALL" "core:access_policy:access_policies:drupal:[user.is_super_user]=1:[user.roles]=authenticated,administrator"
"GET" "core:cachetags:x-redis-bin:access_policy"
"GET" "core:access_policy:_redis_last_delete_all"
"HGETALL" "core:access_policy:access_policies:drupal:[languages:language_interface]=en:[user.is_super_user]=1:[user.roles]=authenticated,administrator"
"MGET" "core:cachetags:config:user.role.authenticated" "core:cachetags:config:user.role.administrator" "core:cachetags:access_policies"

Explanation:
1. first cache get, the redirect
2. this needs to check if there was an invalidateAll() on the bin, so it checks the cache tag for it. This could be skipped with the referenced invalidateAll() issues.
3. then it needs to check if there was a deleteAll() on the bin
4. then the real cache get with the redirected cache key
5. then it checks the 3 specific cache tags in my case for admin.

All of this is purely done to support more complex implementations that the majority of sites are actually not using. For the implementation in core, the only thing this is caching is the roles, which are in the ChainedFast config bin.

This is a lot of lookups for what it's doing, so setting to major.

Steps to reproduce

Proposed resolution

I'm not sure. For core, it could use discovery bin, but I understand that sites using group module for example might have a lot of different cache variations. But we IMHO need a way to avoid all this overhead for the vast majority of sites that don't have expensive access policies.

What if we add some kind of interface or method to indicate whether access policies are actually worth to cache? Group can do that, Core won't, and then we just use the static cache if none of the access policy services we have require persistent cache?

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.1 πŸ”₯

Component

user system

Created by

πŸ‡¨πŸ‡­Switzerland berdir Switzerland

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @berdir
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland
  • πŸ‡¬πŸ‡§United Kingdom catch

    The interface seems worth a try. In that case would we keep the separate bin for when it's used?

  • Pipeline finished with Failed
    2 months ago
    Total: 550s
    #399997
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Created a merge request, works as expected, combined with the improvements in core (such as the theme extension list cache) and redis (the already committed last write timestamp and the invalidateAll() optimization), this is what's left now up until dynamic page cache

    "HGETALL" "core:container:service_container:prod:11.2-dev::Linux:a:4:{i:0;s:4
    :42:\"modules/contrib/redis/example.services.yml\";i:3;s:40:\"modules/contrib
    "GET" "core:container:_redis_last_delete_all"
    "GET" "core:last_write_timestamp_cache_config"
    "GET" "core:last_write_timestamp_cache_discovery"
    "GET" "core:cachetags:entity_types"
    "HGETALL" "core:data:route:[language]=en:[query_parameters]=:/"
    "GET" "core:cachetags:route_match"
    "GET" "core:data:_redis_last_delete_all"
    "GET" "core:last_write_timestamp_cache_bootstrap"
    "HGETALL" "core:dynamic_page_cache:response:[request_format]=html:[route]=vie
    
    "GET" "core:dynamic_page_cache:_redis_last_delete_all"
    "HGETALL" "core:dynamic_page_cache:response:[cookies:big_pipe_nojs]=:[languag
    

    (Thee's quite a bit more after, such as placeholdered blocks, library and assets, routes being loaded).

    This is getting pretty close to what's feasible. The only additional thing I can think of right now is the cache tag preloader, I'm seeing the following standalone and stable cache tags:

    "GET" "core:cachetags:entity_types"
    "GET" "core:cachetags:route_match"
    "GET" "core:cachetags:library_info"
     "GET" "core:cachetags:routes"
    "MGET" "core:cachetags:config:shortcut.set.default" "core:cachetags:config:user.role.authenticated" "core:cachetags:config:user.role.administrator" "core:cachetags:access_policies"
    

    4 easy ones, to reduce 4 get requests to a single mget.

    The last is shortcut and interesting, that one still contains the access policy cache tags. they are still collected and I guess shortcut adds that as a dependency to it. but there's also the shortcut set itself. for access policies, I still feel like role save should just directly invalidate access_policies, and shortcut could maybe use a single list cache tag. hard to predict if there is a scenario where shortcuts frequently change? Then we could consider to preload that too, but that's about saving a single extra query and likely only in some cases.

    There is of course also the huge list from dynamic page cache itself, but as argued there, it's impossible to preload that whole, so not much point in looking at that in this scope.

  • Pipeline finished with Success
    2 months ago
    Total: 476s
    #400010
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Performance tests are very unreliable for me locally. script size is like 4kb off and everything goes completely haywire if I don't add the explicit cron run in setUp(). Is my system too fast? too slow? is it WSL? different chrome version? Honestly no idea.

    I was also confused why don't see a change on cache get requests with this on any performance tests except one single cache tag is valid count being reduced in a single test.

    The thing is that this trades regular cache requests with fast chained requests, but for Performance tests, those are the same thing. I don't think they should be. Obviously a separate issue, but what about grouping the cache operations per bin? We can keep the current methods and then aggregate, but we can also add methods to assert counts on specific bins and add those to select tests?

  • πŸ‡¬πŸ‡§United Kingdom catch

    Assertions bin sounds good and better than fast vs non fast - could you open an issue?

    Shortcut currently isn't being used in Drupal CMS (since you can do most of what it's used for with menu UI + navigation anyway) so could be a candidate to remove from core.

    Just directly invalidating access policies on role save seems reasonable but I think we need Kristiaan's input on that.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Performance test/cron issue is https://www.drupal.org/project/drupal/issues/3500489 πŸ› Performance tests need to run cron Active and needs review.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Created πŸ“Œ Aggregate cache operations per bin to allow for more specific asserts Active for the cache operations by bin stuff.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    > In that case would we keep the separate bin for when it's used?

    Good question, I'm not sure. Really depends on the implementations of the used access policies. I imagine that with group or so, there could be a fairly large amount of variations, but they should be quite small?

    Sites that want to optimize this can configure the access_policy bin use ChainedFast, it's still a separate bin though. Updating it to use a different/existing bin is a little bit weird, since the service name and properties are all directly tied to that name.

  • The Needs Review Queue Bot β†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • First commit to issue fork.
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Couple of thoughts:

    The reason the access policy API uses a separate bin is because it can indeed grow quite exponentially in size when using Group or other complicated custom access policies. It just seemed like a good fit to have a dedicated bin for and has already proven its worth in security issues where we could simply ask for one's cache_access_policy table to see what was going on.

    You're the second person I've seen where languages:language_interface is added to the access policy cache and I have no idea where this is coming from. Neither of the two stock access policies add this cache context. Any information on this?

    I am not at all opposed to the idea you propose here where each access policy can indicate whether it's worth caching or not. Given how access policies are deemed internal and final, I'm also a fan of using an interface for this. Using marker interfaces doesn't break decorators either, should anyone ever want to go down that route.

    So +1 on the MR, -1 (for now) on sharing a bin with other bits of core. And I'd really like to figure out where that language cache context is coming from. It has no place in access policies.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    > And I'd really like to figure out where that language cache context is coming from. It has no place in access policies.

    Config translation overrides. If the user role label is translated, then it gets the current language cache context. you then call addCacheableDependency().

    That would go away with my suggestion to not do that and instead do this in Role:

    public function getCacheTagsToInvalidate() {
        $cache_tags = parent::getCacheTagsToInvalidate();
        $cache_tags[] = 'access_policies';
        return $cache_tags;
      }
    

    This ensures that on every save and delete of a user role, access_policies cache tag gets invalidated. Then we don't need variable cache tags when just using core and this becomes very easy to preload.

    Of course, if someone were to use config overrides to actually change permissions of a role (domain config?) then the cache would be wrong, but I really hope people don't do that and they could still provide their own access policy plugin.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Oh damn, when adding the roles as dependencies I never thought it would be adding cache contexts. That's not problematic, per se, but totally unnecessary and introduces a cache redirect for no reason.

    Adding the access_policies cache tag in Role is not ideal either, though. It's perfectly valid to turn off the UserRolesAccessPolicy and then you'd be flushing the access policies whenever a role changes even though it doesn't affect the permissions.

    The only reason we'd not want to filter out the cache context is in case some role has certain permissions in one language and not another. You'd be right to call that madness, but with the access policy API, we promised almost anything is possible so removing some of the options would seem to break that promise.

    Then again, nothing stops us from removing said context (and the needless redirect) in core and then people can still write their own access policy in lieu of the one in core if they do want to have different permissions per active language. We might want to tackle this in a follow-up.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland
    Adding the access_policies cache tag in Role is not ideal either, though. It's perfectly valid to turn off the UserRolesAccessPolicy and then you'd be flushing the access policies whenever a role changes even though it doesn't affect the permissions.
    

    It is possible, yes, but that seems quite theoretical. How many sites really are going to do that and completely ignore roles for access? You can''t _only_ have groups? I struggle to think of use cases (a simple single user site could just do away with anything except the super user policy, but why bother?) And if they do, it's likely that there also won't be any/many role changes (because for what?), so if roles don't get saved, it won't have to do invalidate anything.

    IMHO, this is worth doing and will improve and simplify cache tag preloading. But I'm perfectly happy to discuss that in a separate issue. I think this is a very valuable improvement that doesn't need to be hold up.

    IMHO the only thing to discuss for this is https://git.drupalcode.org/project/drupal/-/merge_requests/10957#note_44.... Any objections to pushing that interface directly into the Cache namespace so we can use it for other use cases like blocks?

  • πŸ‡¬πŸ‡§United Kingdom catch

    Any objections to pushing that interface directly into the Cache namespace so we can use it for other use cases like blocks?

    Sounds good to add the single interface now since we know we have at least two use cases for it.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Moved the interface. Also had some cruft in here from another issue and made a mess with the rebase, so I squashed everything together in a single commit.

    Documenting the more generic interface is a bit challenging, tried something, open for feedback on that part.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    New title and slightly updated issue summary to reflect what this does.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    I don't see the changes, did you forget to push?

    I was wondering if we could opt for an attribute rather than a marker interface because then we would be able to flag individual methods as not desirable to cache if we ever wanted to do that. The downside is that attributes are slower than interfaces and, in this case, we may end up calling the processor a lot of times during a request. Fine either way, just thinking about potential future uses and how attributes might be nicer then.

    Either way, +1 on moving it into the cache namespace.

    Re #16: You're right, if the user roles access policy is turned off it would make little sense to keep editing roles. The example I had in mind was not Group-related by the way, but rather a project where we got your roles from an external system and wrote an access policy for those.

    The only concern I still have is that we're tightly coupling the role storage to the access policy cache implementation. While it might make sense to do so, it is setting a bad example. Ideally, access policies are self-contained so that when they get turned off, nothing is left lingering around.

    I'm also fine to discuss this further in a follow-up with you.

  • Pipeline finished with Failed
    2 months ago
    Total: 611s
    #409107
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    I pushed to it to some other fork, not sure what was going on there. Changes should be here now.

    I agree with you about separation on roles and access policies in theory but IMHO it's a conflict between proper/clean architecture vs practical performance improvements for the 99% of use cases. And I think there's still more problematic coupling between roles and access than this (for example using user.roles:authenticated/anonymous as cache context for isAuthenticated() checks. Things would fail really badly I expect if your users wouldn't even have the anonymous/authenticated roles). A similar example for this are date format config entities, they don't even have their own cache tags, saving them just invalidates the rendered cache tag. They change very rarely and then the date format API's don't need to consider cacheability for rendered output.

    But again, I have no intentions of adding that this this merge request, I will create a separate issue and we can discuss there. With caching disabled for most sites, it's much less of an issue as those cache tags no longer show up so early.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Ah, very good, additional tests _are_ now failing, now that πŸ› Functional Javascript test false postive and missing browser output Active . But still only on the cache tags.

    I'd love to get some reviews on πŸ“Œ Aggregate cache operations per bin to allow for more specific asserts Active (it also contains also the assert part of ✨ Introduce a list of "common cache tags" to reduce lookup query amount Active ), get that done first, and then we can show much better that the cache get operations might have the same total count but move from the regular access policy bin to ChainedFast config/bootstrap bins.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Code looks good. I agree we should see if we can commit πŸ“Œ Aggregate cache operations per bin to allow for more specific asserts Active first. I'll check that one for review next.

    Any thoughts on marker interface vs attribute? Again, fine to leave as is but would like if other people involved gave their opinion.

  • πŸ‡¬πŸ‡§United Kingdom catch

    I personally still like interfaces for this where we're definitely instantiating the class anyway, because it's just a quick instanceof check instead of dealing with reflection.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Given how we'd probably always run code from a collector, processor, etc. it's fair to assume the subjects will always be instantiated classes. So I suppose interface is the way to go here. Thanks for the feedback.

  • Pipeline finished with Success
    2 months ago
    Total: 4460s
    #410643
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Rebased and updated the test assertions, really like how this works out now.

    Clearly visible that cache get moves from access policy bin to config, directly resulting in fewer queries. often also fewer cache tag lookups. Sometimes the same or some cache tags just move down later on the same page and we eventually still load them, but it's never more.

    Left one comment, but that's not really related to this, just seems strange about the variable number of cache tag lookups.

    Also, fun side note, we have a bootstrap bin, but it's sometimes only the 4th or so bin that is actually being used and data is often used earlier for route match lookups. I do wonder if there's something we could improve about that, but that's obviously not related to this issue at all.

  • Pipeline finished with Success
    2 months ago
    Total: 1870s
    #410689
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    One thing it does show is that the anonymous and/or authenticated role's cache tag is still being checked. So the potential optimization we talked about earlier when it comes to adding roles as dependencies might not be that big.

    Also, all of the tests no longer check for the access_policies tag, yet StandardPerformanceTest looks for it in:

    -        ['config:block.block.stark_primary_local_tasks', 'local_task'],
    +        ['access_policies', 'config:block.block.stark_primary_local_tasks', 'config:user.role.authenticated', 'local_task'],
    

    What's going on there?

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    I guess that adds the cacheable dependencies of the current user or something to that element? Not in scope of this issue IMHO. Cache tags is no longer the scope of this issue, only the actual persistent cache lookups, removing/moving the corresponding cache tags is just a side effect.

    Yes, some roles are still being added, both on their own and through something that has the access policy cacheability (there removing them would actually help IMHO). We can investigate that further in a separate issue.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    I suppose you're right that it's out of scope for this issue. In that regard you could say the goal of this issue has been achieved here.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Yes, IMHO that's the case. I created πŸ“Œ Optimize cache tag usage in access policies and roles Needs review , so we can discuss cache tags further over there.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Thanks, will follow that one.

    • catch β†’ committed 380195b0 on 11.x
      Issue #3500683 by berdir, nod_, kristiaanvandeneynde, catch: Allow...
  • πŸ‡¬πŸ‡§United Kingdom catch

    I think we should backport at least the new interface but probably not the implementation to 10.5.x, so moving to 'to be ported' for there.

  • Merge request !11674backport interface β†’ (Open) created by berdir
  • Status changed to Needs review 4 days ago
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Created a MR for the interface backport.

  • Pipeline finished with Failed
    4 days ago
    Total: 679s
    #460243
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Seems like a good backport.

  • The Needs Review Queue Bot β†’ tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Bot seems to be confused, maybe because this targets 10.5.

  • The Needs Review Queue Bot β†’ tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • I think the no-needs-review-bot tag deactivates the bot.

  • The Needs Review Queue Bot β†’ tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • πŸ‡«πŸ‡·France nod_ Lille

    umm, sorry about that not sure what's wrong with the bot

  • The Needs Review Queue Bot β†’ tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • πŸ‡«πŸ‡·France nod_ Lille

    some problems in the D10 git checkout locally, not sure why the tag is not fetched by the bot, maybe an unfortunate timing + caching on the d.o endpoint

  • πŸ‡«πŸ‡·France nod_ Lille

    hiding patch files

Production build 0.71.5 2024