- Issue created by @kristiaanvandeneynde
- Merge request !7190Draft: Start counting and checking for cache tag lookup queries. → (Open) created by kristiaanvandeneynde
- Status changed to Needs work
11 months ago 12:27pm 26 March 2024 - 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
This already proves that our isValid count doesn't translate 1:1 to database queries. Perhaps if this goes green it should already be committed so that the work being done here can more easily prove the count difference.
Even if we don't commit this in its current form, we can still compare results later on to the following.
- testFrontPageAuthenticatedWarmCache() count 6
- testFrontPageHotCache() count 1
- testAnonymous() counts 33, 26 and 24
- testLogin() count 14
- testLoginBlock() count 29
Marking as NW because it doesn't contain the actual fix yet.
- 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Here are the common stand-alone cache tags I found in Grafana:
- CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:form
- config:core.extension
- config:search.settings
- config:system.site
- config:user.role.anonymous
- config:user.role.authenticated
- entity_bundles
- entity_field_info
- entity_types
- http_response
- library_info
- local_task
- rendered
- route_match
- routes
- views_data
Then there were some entity type related tags we should put in a separate subscriber:
- block_content_view
- block_view
- node_list
- node_values
- node_view
- taxonomy_term_list
- user_values
- user_view
Then we had some tags related to the following that could be added dynamically in follow-up issues for new event subscribers:
- Image styles
- Placed blocks
- Menus
- config:system.menu.account
- config:system.menu.main
- Text formats
- config:filter.format.basic_html
- config:filter.format.full_html
IMO we should start with the common ones and the entity type related ones as those do not need to be cleared at all unless a cache clearing event takes place (new module, new entity type, ...) The event subscribers based on config or entities (menus, blocks, formats, image styles, ...) need to somehow clear the "common cache tag" list whenever the list changes.
The cached list of common cache tags should be manually cleared by these event subscribers when necessary rather than use cache tags itself because that will get really nasty really quickly :D
- 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Latest commit does not introduce the event stuff yet, but shows a proof of concept where we hard-code the common list from above. The results are amazing:
- testFrontPageAuthenticatedWarmCache() count 6 -> down to 2
- testFrontPageHotCache() count 1
- testAnonymous() counts 33, 26 and 24 -> down to 20, 17 and 16
- testLogin() count 14 -> down to 3
- testLoginBlock() count 29 -> down to 18
That's -67%, -39%, -35%, -33%, -79% and -38% amount of queries.
- 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Switched to an event subscriber. left a bunch of @todo comments.
- We might want to cache the outcome of the event.
- The entity type one is still hard-coded but should be dynamic
- This might need tests
Reason I'm using:
calls: - [setEventDispatcher, ['@event_dispatcher']]
Is because if I tried to add it as a proper argument Drupal wouldn't even boot because DrupalKernel::$defaultBootstrapContainerDefinition contains cache_tags_provider.container and at that point we do not have a container yet, so event_dispatcher doesn't resolve. Trying to add 'event_dispatcher' to that array just caused more errors where the autowired 'session' closure wasn't being passed in. Using a setter made all of these issues go away.
- 🇬🇧United Kingdom catch
I opened 📌 Add JSON:API performance tests Active based on the discussion in the MR.
- 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Okay so a local attempt at getting this to work with entity types dynamically was running into eternal loops, but I just had a chat with @catch and we think we may have a solution for that:
We fire an event early on that asks for this list of cache tags, when/if we find this list in the bootstrap cache we set it on the container. In the CacheTagsChecksumTrait, we check for said parameter and don't do anything special if it's not there. This will prevent eternal loops.
Then, we can have the entity type manager load all the info it needs and spit out a list of common cache tags. In the end, we cache said list without cache tags (because here be monsters), but with a max-age in the bootstrap cache. Because it's just an optimization, we don't need to worry that much if this cache becomes stale for a short while; next cache clear or after the list expired it will be recalculated anew.
- 🇨🇭Switzerland berdir Switzerland
Working on 📌 Add caching to ConfigurableLanguageManager::getLanguages() Active reminded me about this, as that issue adds a very early cache tag. The first in fact if page cache isn't a hit.
I still think this is interesting, but I think it's not quite so straight-forward, especially the timing of when we do this load.
Tests are all against umami /en/recipes/super-easy-vegetarian-pasta-bake
The best case scenario is that we have an internal page cache hit. That means exactly one cache tag lookup.
["config:block_list","config:block.block.umami_search","config:block.block.umami_languageswitcher","config:configurable_language_list","config:block.block.umami_account_menu","config:block.block.umami_page_title","config:block.block.umami_messages","config:block.block.umami_help","config:block.block.umami_main_menu","config:block.block.umami_branding","config:block.block.umami_local_tasks","config:block.block.umami_footer","config:block.block.umami_footer_promo","block_content:2","config:block.block.umami_content","config:block.block.umami_breadcrumbs","config:block.block.umami_views_block__articles_aside_block_1","node:3","config:block.block.umami_views_block__promoted_items_block_1","config:block.block.umami_banner_home","config:block.block.umami_banner_recipes","config:block.block.umami_views_block__recipe_collections_block","config:block.block.umami_disclaimer","block_content:1","block_view","config:filter.format.basic_html","config:system.menu.footer","block_content_view","config:core.entity_view_display.block_content.footer_promo_block.default","media_view","media:19","config:image.style.medium_8_7","file:37","config:core.entity_view_display.media.image.medium_8_7","config:views.view.recipe_collections","taxonomy_term_list","taxonomy_term:16","taxonomy_term:15","taxonomy_term:14","taxonomy_term:13","taxonomy_term:12","taxonomy_term:11","taxonomy_term:10","taxonomy_term:9","taxonomy_term:8","taxonomy_term:7","taxonomy_term:6","taxonomy_term:5","taxonomy_term:4","taxonomy_term:3","taxonomy_term:2","taxonomy_term:1","node_view","config:core.entity_view_display.node.recipe.full","config:views.view.related_recipes","node_list","node:1","node:6","node:7","node:10","media:1","config:responsive_image.styles.3_2_image","config:image.style.large_3_2_768x512","config:image.style.large_3_2_2x","config:image.style.medium_3_2_2x","config:image.style.medium_3_2_600x400","config:core.entity_view_display.media.image.responsive_3x2","config:core.entity_view_display.node.recipe.card","user:6","media:6","media:7","media:21","media:3","taxonomy_term:21","taxonomy_term:28","taxonomy_term:31","local_task","config:workflows.workflow.editorial","config:system.menu.main","node:20","config:system.site","config:system.menu.account","CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:form","rendered","http_response","config:user.role.anonymous"]
One single lookup with all the tags. We neither can nor want to preload them all, because they will be different on every page. In other words, trying to preload common ones before this will actually expand it from one to two queries.
Second best is a dynamic page cache hit:
/en/recipes/super-easy-vegetarian-pasta-bake ["config:user.role.authenticated","config:user.role.administrator","access_policies"] ["route_match"] ["entity_types"] ["node_values","entity_field_info"] ["config:block_list","config:block.block.umami_page_title","config:block.block.umami_messages","config:block.block.umami_help","config:block.block.umami_content","config:block.block.umami_views_block__articles_aside_block_1","node:3","config:block.block.umami_views_block__promoted_items_block_1","config:block.block.umami_banner_home","config:block.block.umami_banner_recipes","config:block.block.umami_search","config:block.block.umami_languageswitcher","config:configurable_language_list","config:block.block.umami_account_menu","config:block.block.umami_main_menu","config:block.block.umami_branding","config:block.block.umami_local_tasks","config:block.block.umami_footer","config:block.block.umami_footer_promo","block_content:2","config:block.block.umami_breadcrumbs","config:block.block.umami_views_block__recipe_collections_block","config:block.block.umami_disclaimer","block_content:1","block_view","config:filter.format.basic_html","config:system.menu.footer","block_content_view","config:core.entity_view_display.block_content.footer_promo_block.default","media_view","media:19","config:image.style.medium_8_7","file:37","config:core.entity_view_display.media.image.medium_8_7","config:views.view.recipe_collections","taxonomy_term_list","taxonomy_term:16","taxonomy_term:15","taxonomy_term:14","taxonomy_term:13","taxonomy_term:12","taxonomy_term:11","taxonomy_term:10","taxonomy_term:9","taxonomy_term:8","taxonomy_term:7","taxonomy_term:6","taxonomy_term:5","taxonomy_term:4","taxonomy_term:3","taxonomy_term:2","taxonomy_term:1","node_view","config:core.entity_view_display.node.recipe.full","config:views.view.related_recipes","node_list","node:1","node:6","node:7","node:10","media:1","config:responsive_image.styles.3_2_image","config:image.style.large_3_2_768x512","config:image.style.large_3_2_2x","config:image.style.medium_3_2_2x","config:image.style.medium_3_2_600x400","config:core.entity_view_display.media.image.responsive_3x2","config:core.entity_view_display.node.recipe.card","user:6","media:6","media:7","media:21","media:3","taxonomy_term:21","taxonomy_term:28","taxonomy_term:31","local_task","config:workflows.workflow.editorial","config:system.menu.main","config:system.site","config:system.menu.account","CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:form","config:system.theme","config:system.menu.admin","rendered","http_response"] ["library_info"] ["routes"] ["config:shortcut.set.default"]
Multiple queries that are very stable and easy to optimize, but the dynamic page cache one itself is not, same as above. And the access policies might *look* easy, but it contains the tags of all roles of the current user. See below point 3.
Umami has history, so also triggers a second request:
["config:user.role.authenticated","config:user.role.administrator","access_policies"] ["route_match"] ["entity_types"] ["node_values","entity_field_info"] ["routes"]
Same page, but disabled dynamic page cache, with render cache hits.
["config:user.role.authenticated","config:user.role.administrator","access_policies"] ["route_match"] ["entity_types"] ["node_values","entity_field_info"] ["routes"] ["node_view","node:3","config:core.entity_view_display.node.recipe.full","config:views.view.related_recipes","node_list","node:1","node:6","node:7","node:10","media_view","media:1","config:responsive_image.styles.3_2_image","config:image.style.large_3_2_768x512","config:image.style.large_3_2_2x","config:image.style.medium_3_2_2x","config:image.style.medium_3_2_600x400","config:core.entity_view_display.media.image.responsive_3x2","config:core.entity_view_display.node.recipe.card","user:6","media:6","media:7","media:21","config:filter.format.basic_html","media:3","taxonomy_term:2","taxonomy_term:21","taxonomy_term:28","taxonomy_term:31","rendered"] ["config:views.view.recipe_collections"] ["block_content_values"] ["config:system.theme","config:system.menu.admin"] ["block_view","config:block.block.umami_search","CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:form"] ["config:system.menu.account","config:system.menu.main","config:system.menu.footer"] {"1":"config:block.block.umami_account_menu"} {"1":"config:block.block.umami_branding","2":"config:system.site"} {"1":"config:block.block.umami_main_menu"} {"1":"config:block.block.umami_messages"} {"1":"config:block.block.umami_help"} {"1":"config:block.block.umami_local_tasks","2":"local_task","5":"config:workflows.workflow.editorial"} {"1":"config:block.block.umami_breadcrumbs"} {"1":"config:block.block.umami_views_block__recipe_collections_block","3":"taxonomy_term_list","4":"taxonomy_term:16","5":"taxonomy_term:15","6":"taxonomy_term:14","7":"taxonomy_term:13","8":"taxonomy_term:12","9":"taxonomy_term:11","10":"taxonomy_term:10","11":"taxonomy_term:9","12":"taxonomy_term:8","13":"taxonomy_term:7","14":"taxonomy_term:6","15":"taxonomy_term:5","16":"taxonomy_term:4","17":"taxonomy_term:3","19":"taxonomy_term:1"} {"1":"config:block.block.umami_footer_promo","2":"block_content_view","3":"block_content:2","4":"config:core.entity_view_display.block_content.footer_promo_block.default","6":"media:19","7":"config:image.style.medium_8_7","8":"file:37","9":"config:core.entity_view_display.media.image.medium_8_7"} {"1":"config:block.block.umami_footer"} {"1":"config:block.block.umami_disclaimer"} ["library_info"] ["config:shortcut.set.default"]
However, a scenario with d(ynamic) page cache miss due to an invalidated cache tag is I guess also very common, where that will basically act as a bulk load for many of the following cache tags that will be checked again.
To me, that means two things:
1. I think it should not be an internal thing in the checksum trait but an external request subscriber that runs very early but after page cache. It doesn't matter too much if it's on a page cache miss because then most will already be filtered out from the static cache.
2. I think we should avoid making a first iteration too complicated. We can introduce the concept, with a few examples that remove a handful of fixed cache tags that the vast majority of sites use. A contrib module could always experiment with more, I also still think supporting extra cache tags through Settings::get() would be an easy way to optimize specific sites.
3. I still see a lot of potential to optimize away overlapping and unnecessary dynamic cache tags. Why do we have routes and route_match, do we really need both? I have an issue open about the block ones, that turned out to be not so trivial but it has a lot of potential. For access policies, what if we just invalidate the access_policies cache tag if any role is saved, instead of adding a dynamic amount of cache tags? It's always a tradeoff between performance and simplicity, and invalidations vs. lookups.This also reminded me about the current redis issue with the last write, which is quite inefficient, created 📌 Optimize bin cache tags and last write timetamp Active . Which in turn made me wonder if fast chained backends should use cache tags instead of a regular cache get on the fast backend? They now require cache tag support anyway and it would mostly avoid the redis problem and could be combined with the preloading done here.
Performance stuff is always _such_ a rabbit hole ;)
- 🇨🇭Switzerland berdir Switzerland
Suggestions, how about I pull the performance test parts to assert the behavior into 📌 Aggregate cache operations per bin to allow for more specific asserts Active , then we can document the current baseline and it makes it easier to compare improvements?
I did a merge of 11.x (since previous updates also did a merge), didn't review yet if the tests still pass.
I did have a quick look at the current event based idea, but it currently breaks when using redis. The problem is that redis uses cache tags to handle invalidateAll(), which means that even the initial container cache load does a cache tag check. This is being optimized away in both core (by deprecating invalidateAll()) and optionally just in redis for now, but it's there and the current default behavior.
Ignoring that, the next thing is then that this would trigger an event subscriber on internal page cache. Even with caching and what not, that's a ton of extra complexity at that point that we don't want or need as mentioned above.
So I really think this should be externalized into a separate and very early request subscriber.
- 🇬🇧United Kingdom catch
Suggestions, how about I pull the performance test parts to assert the behavior into #3500739: Aggregate cache operations per bin to allow for more specific asserts, then we can document the current baseline and it makes it easier to compare improvements?
Yes that sounds good - easier to show the diff in this issue that way.
- 🇨🇭Switzerland berdir Switzerland
Running into some awkward edge cases with this on non-installed sites, installer tests are failing that this results in creating the cachetags table. Might need to rethink this a bit. Possibly a new method to register preload cache tags on the checksum service but it will only run when doing an actual lookup query. not sure.
- 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
One thing I do in Group when no services are available yet but I need to polyfill some services based on plugins is using the discovery directly:
$discovery = new AttributeDiscoveryWithAnnotations( 'Plugin/Group/Relation', $container->get('container.namespaces'), GroupRelationType::class, 'Drupal\group\Annotation\GroupRelationType', [] ); // ... foreach ($discovery->getDefinitions() as $group_relation_type_id => $group_relation_type) { assert($group_relation_type instanceof GroupRelationTypeInterface); // ... }
Maybe we could do the same to find entity type definitions and get their cache tags that way? Could do the same for certain config entity types perhaps. Downside is we need to do this on every request. So we'd have to cache the outcome with a max age but no cache tags, for the reasons discussed above.
This does lead to potential cache duplication, but because the information stored in the "common cache tags" cache is non-critical I don't see how it could hurt if said cache were to contain too much or too little information for a small amount of time.
- 🇬🇧United Kingdom catch
I ran into the same installer tests issue with 📌 Add a core FileCache implementation that is database-backed Active , but nothing useful applicable from there because I think it needs a different approach anyway.
Possibly a new method to register preload cache tags on the checksum service but it will only run when doing an actual lookup query
This seems worth a try and would have the advantage that with a theoretical request that has no cache tags to look up, nothing would run.
- 🇨🇭Switzerland berdir Switzerland
Added that now, means alternative implementations will need to implement the new interface (I went with that because alternative implementations might not use the trait and there's also decorators.
The checksum count is reset to previous values, but that's kind of moot anyway as it doesn't really have a performance impact, as discussed, we might want to remove/deprecate those.
One aspect of this is that there can easily be multiple subscribes that all add their own tags. So we don't need to trigger its own event or something to collect the tags.
- 🇨🇭Switzerland berdir Switzerland
Somehow only one test caught a really fun bug with the preloaded cache tags, calculateChecksum() then returned the checksum for all tags including the preloaded ones, unsure why that only showed up in one case.
- 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
Found this via @berdir's blog post at https://www.md-systems.ch/en/blog/2025-01-26/redis-startup-performance-i... — fascinating stuff! 😄
I remember that back in 2013–2015 when I helped stabilize Drupal 8's caching layers with @catch, @berdir and others, that we discussed (basically dreamed) of one day having enough information/insight to do things like y'all are doing here.
I specifically remember the trade-offs of Drupal shipping with default assumptions that would not be true once contrib/custom modules are added. Those tensions are very apparent here. It'd be very cool to have Drupal self-monitor and self-optimize, but the overhead/cost of that is likely prohibitive.
~2 years ago I was thinking about this too in the context of Acquia's hosting platform. I remember having some ideas of how to solve this generically, but then never found the time to follow through. Unfortunately, those ideas evaporated since. But IIRC they centered on the use of a bloom filter. I forget how exactly. But reading this, I think I was thinking of using it to automatically determine the most commonly used cache tags.</braindump>
- 🇨🇭Switzerland berdir Switzerland
Rebased, performance tests had quite a bit of a back and forth, so I squashed the commits together.
I did a braindump of my own recently in 🌱 [meta] Improve reports and cache performance investigation features Active of ideas on how to expose more issues and optimization possibilities. Also not sure if https://www.drupal.org/project/webprofiler → already supports displaying loaded cache tags for a page, if not then it definitely should.
As you said, it's a tradeoff. A presumably tiny cost of loading a few extra cache tags that the current page doesn't need but that cost increases if we introduce dynamic logic to guess those tags. See 🐛 RoutePreloader loads a lot of routes Active for an extreme case of this, there we preload way too much which uses a good chunk of extra memory and I'm trying to reduce that now.
The approach that we now have allows for multiple implementations to add tags without resulting in multiple queries, so a contrib module could try to do something fancier, but I think in practice, most sites have a pretty stable baseline of cache tags that are really frequently used. My blog post has the full list of cache tags I added in our project.
Some of the more common things that will show on specific projects are *_values tags from entity storage cache (after some deliberation, I added user_values now as it's a core module and shows up in most lists but not node_values). But plenty of content entity types are also not regularly used when rendering pages, so I'm not sure that we need to dynamically collect and add them (and then cache that information again).
And I'd prefer to limit this issue to just introduce this capability and a set of baseline tags that most sites will use. Decoupled sites likely have quite a different situation for example, but many tags of this list should still be relevant there. One reason to have a JSON:API performance test, so we can better see the impact of issues like this on that. It won't need libraries and local tasks for example, but it still needs routes, access, entity types and fields.
Another thing worth mentioning here is that there is only really a benefit if we can eliminate a lookup query entirely.
Take this section as an example
['rendered', 'user:0', 'user_view'],
['config:filter.format.restricted_html', 'node:1', 'node_view'],
['block_view', 'config:block.block.stark_site_branding', 'config:system.site'],
There's little benefit to add rendered or user view since it's almost always combined with a specific user id, same for formats and node_view, as long as the node:1 lookup is still needed we gain nothing. And again, same for block_view and config:system:site, since the block id is configuration. That said, I did open 📌 Use a single per-theme cache tag for block config entities Needs review quite a while ago to eliminate all those block cache tags, will try to get back to that. that will change things around and we could consider to preload certain cache tags.
Also, page cache and dynamic page cache misses essentially also act as preloaders for a given page. If a page cache can be loaded but is no longer valid because of a cache tag invalidation such as node_list, we've already preloaded 90%+ of the cache tags for all the entities, blocks and configs that we'll see on that page.
- 🇬🇧United Kingdom catch
If a page cache can be loaded but is no longer valid because of a cache tag invalidation such as node_list, we've already preloaded 90%+ of the cache tags for all the entities, blocks and configs that we'll see on that page.
Head exploding emoji.
- 🇨🇭Switzerland berdir Switzerland
Yeah, that is pretty interesting, and I think many sites will more often than not have a page/dynamic page cache miss due to cache tags instead of it not being in the cache at all.
I added a test to demonstrate that situation, it's interesting, but I think it's not in scope to be committed here and in its current form is IMHO too fragile. But as always with these performance issues, various interesting bits in there: The level of insight into performance metrics that we have now is really powerful and allows us to explicitly and fairly easily assert the exact behavior around caching.
* There is a bug in argument replacement, if there are more than 10 numbered placeholders, then it will replace "foo_10,foo_11,..." with "foo_1". reversed the args array to fix this, should probably be its own issue.
* I was really surprised to see those route queries, I currently don't understand why creating a new node invalidates route load caches (route match maybe, but specific raw route data? why would that change? somehow there's also a query before the current preload happens. The whole query is I think also too fragile and would be a pain to maintain, as every time we add or remove a module from standard or add/remove any of those routes we'd have to manually update that very, very long list. Good thing is that many of these problems will go away if we move forward with 🐛 RoutePreloader loads a lot of routes Active , except maybe the part why they are invalidated in the first place.
* the 7 caches being written are: data (2 route load caches, 1 views data cache), render (views render cache, including a redirect due to url:site cache context being added by something), dynamic page cache and page are obvious. I tried adding an explicit assert on those, but they contain absolute URL's and hashes, we would need to normalize those similar to queries.
* cache tag preloading is working well, as expected. the only thing not covered yet with the preload list is views_data and the core extension config (we still don't delete all caches on a module install, only uninstall) and node_values. - 🇬🇧United Kingdom catch
I was really surprised to see those route queries, I currently don't understand why creating a new node invalidates route load caches (route match maybe, but specific raw route data? why would that change?
I just manually tested and the 'routes' cache tag wasn't invalidated, and I can't think how else that could get invalidated. It might be worth an extra page request prior to creating the node just to make sure that cache is definitely warmed. Worth splitting that test to its own issue then we can open spin-offs from there.
including a redirect due to url:site cache context being added by something
Might be worth an issue to discuss adding that to required cache contexts?
views_data
Left a couple of related comments in the URL for user_values and node_values - I think we could implement event subscribers in the respective modules for these.