[meta] Profile/rationalise cache tags

Created on 15 April 2014, about 10 years ago
Updated 8 May 2023, about 1 year ago

Problem/Motivation

In a cold cac he scenario, more cache tags make a page load require more database queries (a probabilistic fraction of one query per tag). Let's see how we can minimize this, what price this has, ans if it is wise to pay it.

There are 2 approaches to reduce the number of cache tags in the DB:
(1) Replace cache tags in different places of Drupal with "manual" invalidation (some of ther related issues
(2) Implement a service that aggregates cache tags - 📌 Add a CacheTagsAggregator Needs work

Downsides of (1) removing cache tags

a) Removing any cache tag breaks existing code that relies on the cache tag, thus is a BC break
b) Even if the cache tag removal is properly deprecated, like for plugins, there is the need for an invalidation API. Contrib modules often have or want to add cached data that depends on plugin definitions. If they need to roll their own, nothing is won, and more work and complexity for all. (see some of the related issues, like 📌 Allow plugin derivers to specify cacheability (tags, contexts, max-age) for their definitions Postponed: needs info and 🌱 Finalize cacheability for plugins Closed: duplicate ).
c) Having cache data without cache tags torpedoes the approach in 📌 Replace CacheItem::Delete with mandatory cache tags, to remove the need to coordinate caches between webheads Active

Proposed resolution

Discuss the new evidence in 📌 Add a CacheTagsAggregator Needs work and 📌 Replace CacheItem::Delete with mandatory cache tags, to remove the need to coordinate caches between webheads Active , and make a proper assessment of wins and costs, and who bears that costs, before removing cache tags.

Remaining tasks

- Discuss, then do

User interface changes

None

API changes

TBD

Original report by catch

Since the basic implementation of cache tags has a runtime overhead based on the number of unique cache tags requested, in some cases it might be better to skip adding those cache tags all together, and go back to a full cache clear for them. Some cache tags which will be requested on (nearly) every page, might hardly ever be cleared except by a full cache clear. For example block configuration, image styles etc. There are also explicit clears of cache bins that should also be reviewed.

Original report by catch

Once #2124957: Replace 'content' cache tag with 'rendered' and use it sparingly is done we'll be able to remove the content cache tag, so that caches are cleared only for items that need to be.

We should also check for explicit clears of cache bins like cache_render since there's probably still some in core.

However that leaves us with some cache tags which will be requested on (nearly) every page, but might hardly ever be cleared. For example block configuration, image styles etc.

Since the basic implementation of cache tags has a runtime overhead based on the number of unique cache tags requested, in some cases it might be better to skip adding those cache tags at all, and go back to a full cache clear. Opening this issue to review this once things are up and running correctly.

📌 Task
Status

Needs work

Version

10.1

Component
Cache 

Last updated about 12 hours ago

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Triaged core major

    There is consensus among core maintainers that this is a major issue. Only core committers should add this tag.

  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Needs issue summary update

    Issue summaries save everyone time if they are kept up-to-date. See Update issue summary task instructions.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇩🇪Germany geek-merlin Freiburg, Germany

    From the IS:
    > Since the basic implementation of cache tags has a runtime overhead based on the number of unique cache tags requested, in some cases it might be better to skip adding those cache tags all together, and go back to a full cache clear for them.

    This Issue originated before ChecksumInvalidater, and i don't see, and doubt, that in the current situation this is still the case.

    > Profile/rationalise cache tags

    Also i could not find profiling in this issue.

    Bringing this up with the intent that when we add more complexity and maintenance cost, we've done due diligence on the win we get.

  • 🇨🇭Switzerland Berdir Switzerland

    Having this issue might not be necessary, but that quote is absolutely still relevant and correct.

    Cache taqs have a cost and it always makes sense to think about whether or not they should be used.

    Some very recent issues on this topic are 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed about removing some plugin discovery cache tags and 🐛 ChainedFastBackend invalidates all items when cache tags are invalidated Fixed on completely changing how the ChainedFastBackend deals with cache tags. Both could be considered child issues of this. Or we could close it and just deal with these issues as they appear.

    One example that I think would make sense to evaluate are the ENTITY_TYPE_view cache tags, there's an argument to be made that they change so infrequently that we could just as well invalidate the rendered cache tag instead.

  • 🇩🇪Germany geek-merlin Freiburg, Germany

    Thanks for elaborating.

    > Cache taqs have a cost

    This is the very claim that i challenge. What are they? What if the cost of avoiding cache tags is far beyond the cost of that cache tags? We can't balance pros and cons thoroughly without that answer.
    We've had gazillions of requests to avoid this or that abstraction, because it has some cost, and the good decisions were made with solid profiling results.

  • Status changed to Needs work over 1 year ago
  • 🇬🇧United Kingdom catch

    @geek-merlin this isn't about getting rid of cache tags altogether, it's about trying to optimise how they're used.

    There are two things that ideally we'd continue to improve:

    If you have one cache item with a unique cache tag, like some of the examples in 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed , then there's an extra database query each time that's requested. In some cases like this, we can manage without the cache tag for those items and just clear them directly. This is what's discussed in the issue summary, and as Berdir points out, it's still correct.

    The opposite example is the entity list cache tags #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing / 🐛 Use new cache tag ENTITY_TYPE_list:BUNDLE in Views to improve cache hit rate Needs work where a cache tag is both used on a lot of cache items, and also invalidated a lot, and there's an opportunity to make that more granular.

    I also don't think we're getting a lot out of this issue, but the trade-offs are still there, and there are still issues trying to improve it.

  • 🇨🇭Switzerland Berdir Switzerland

    > > Cache taqs have a cost
    > This is the very claim that i challenge.

    As @catch said each unique cache tag results in an extra query (if multiple per cache item, then grouped) every single time a cache item is fetched. It's not a claim, it's a fact.

    And as @catch also said, nobody wants to remove cache tags entirely, it's about specific use cases and tags.

    > What are they? What if the cost of avoiding cache tags is far beyond the cost of that cache tags? We can't balance pros and cons thoroughly without that answer.

    That's exactly what this issue is about and the issues that I and @catch linked are specific examples for cases we are looking intoi, but you can not answer that generally.

  • 🇩🇪Germany geek-merlin Freiburg, Germany

    Thanks for elaborating. We're getting closer, so please understand and/or forgive my insistance...
    > each unique cache tag results in an extra query (if multiple per cache item, then grouped) every single time a cache item is fetched.

    This is the one that is still missing documented evidence.

    If you are referring to \Drupal\Core\Cache\DatabaseCacheTagsChecksum::getTagInvalidationCounts, there we have one query for all tags, which was the big win of the ChecksumInvalidator.

    Or are you referring to a different query? Please educate me.
    (I'll soon be pushing some resources into this topic too, and what i'll do and maybe join this quest depends on these insights.)

    PS:
    > nobody wants to remove cache tags entirely
    Who opened that bottle? Not me.

  • 🇨🇭Switzerland Berdir Switzerland

    > there we have one query for all tags

    For all tags *per cache get* call. We call that method in \Drupal\Core\Cache\DatabaseBackend::prepareItem() for example. It's not even optimized if you do a getMultiple() but that doesn't really happen for most cases where cache tags are used anyway. We don't know ahead of time which caches with which cache tags we're going to load on a certain request.

    Quoting myself from 🐛 ChainedFastBackend invalidates all items when cache tags are invalidated Fixed :

    > Testing the patch on default umami installation frontend as admin with disabled render caching, I'm seeing 6 extra queries to the cachetags table (142 vs 148 in total).

    So there's your answer for that scenario (umami frontpage, admin, no render cache), the cost of cache tags is 142 extra database queries in HEAD for a single request. With enabled render caching, it's likely fewer, but depends on what kind of cache hits and misses you have.

  • 🇨🇭Switzerland Berdir Switzerland

    FWIW, the number might be off because disabled caches cause a lot of cache writes that invalidate the static cache in there, but still, it can be a lot.

  • 🇬🇧United Kingdom catch

    For all tags *per cache get call*. We call that method in \Drupal\Core\Cache\DatabaseBackend::prepareItem() for example.

    Yes this is the crux.

    So if I'm getting the cache item for a render array, and it has six cache tags, then that's one database query - not a big deal and render caching is the primary use-case for cache tags.

    But if I'm getting six different cache items for six different plugin managers, that all have their own cache tag that's not used anywhere else, that's six extra database queries.

  • 🇩🇪Germany geek-merlin Freiburg, Germany

    Thanks a lot @catch and @Berdir for elaborating this. So even if we have no hardcore profiling, and it is not one DB query per tag, it's "every additional tag has some likelyhood to trigger an additional DB query" or in other words the factor from query to DB queries may be small but is greater than zero.

    Thinking about ideas like preloading, subqueries, mapper services, but will first think and elaborate.

  • 🇬🇧United Kingdom catch
  • 🇨🇭Switzerland Berdir Switzerland

    Working on 📌 Explicitly support Relay (drop-in replacement for PhpRedis) Fixed , noted in the example once again how many block config entity cache tags we have on a default installation. Similar to how we have group shortcuts into the set and only have cache tags for this, I think it might be worth looking into having a single block:$theme cache tag.

    Block configs usually don't change that much, if they do it's often multiple (drag and drop reorder) and at the same time, for a use case with multiple themes, the list cache tag will invalidate all other themes (page and dynamic page cache) as well not just the one that's affected the block that you just saved.

  • 🇬🇧United Kingdom catch

    A single cache tag for block config seems like a good idea, should we open a specific issue for that one?

  • 🇩🇪Germany geek-merlin Freiburg, Germany

    As promised in #120, i have thought and worked on this.
    I now see the reasons why imho removing cache tags from the API (rather than from the DB) is a breaking change and wrong, and we should rather do the latter.
    Elaborates the reasoning in the IS.

  • 🇩🇪Germany geek-merlin Freiburg, Germany
  • 🇨🇭Switzerland Berdir Switzerland

    I agree with a), removing cache tags is indeed a problem and 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed has one example where a clear call needs to change. It will require a release manager decision whether that change is OK or if we need to find some sort of BC.

    On b) plugin discovery/managers have and always had an invalidation API that should be used instead of the underlying cache tags. I commented on 📌 Allow plugin derivers to specify cacheability (tags, contexts, max-age) for their definitions Postponed: needs info that I disagree with that approach.

    I also disagree on c) and the issue you created, that is not the direction we want to go, we want fewer cache tags, not more.

  • 🇫🇷France andypost

    Another edge case is help topics (each topic is plugin) and its cache needs precise clean on code deploy to update search index for new topics

Production build 0.69.0 2024