[meta] Profile/rationalise cache tags

Comment over 2 years ago →
🇩🇪Germany geek-merlin Freiburg, Germany
From the IS:
> Since the basic implementation of cache tags has a runtime overhead based on the number of unique cache tags requested, in some cases it might be better to skip adding those cache tags all together, and go back to a full cache clear for them.

This Issue originated before ChecksumInvalidater, and i don't see, and doubt, that in the current situation this is still the case.

> Profile/rationalise cache tags

Also i could not find profiling in this issue.

Bringing this up with the intent that when we add more complexity and maintenance cost, we've done due diligence on the win we get.
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
Having this issue might not be necessary, but that quote is absolutely still relevant and correct.

Cache taqs have a cost and it always makes sense to think about whether or not they should be used.

Some very recent issues on this topic are 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed about removing some plugin discovery cache tags and 🐛 ChainedFastBackend invalidates all items when cache tags are invalidated Fixed on completely changing how the ChainedFastBackend deals with cache tags. Both could be considered child issues of this. Or we could close it and just deal with these issues as they appear.

One example that I think would make sense to evaluate are the ENTITY_TYPE_view cache tags, there's an argument to be made that they change so infrequently that we could just as well invalidate the rendered cache tag instead.
Comment over 2 years ago →
🇩🇪Germany geek-merlin Freiburg, Germany
Thanks for elaborating.

> Cache taqs have a cost

This is the very claim that i challenge. What are they? What if the cost of avoiding cache tags is far beyond the cost of that cache tags? We can't balance pros and cons thoroughly without that answer.
We've had gazillions of requests to avoid this or that abstraction, because it has some cost, and the good decisions were made with solid profiling results.
Status changed to Needs work over 2 years ago9:40pm 26 January 2023
Comment over 2 years ago →
🇬🇧United Kingdom catch
@geek-merlin this isn't about getting rid of cache tags altogether, it's about trying to optimise how they're used.

There are two things that ideally we'd continue to improve:

If you have one cache item with a unique cache tag, like some of the examples in 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed , then there's an extra database query each time that's requested. In some cases like this, we can manage without the cache tag for those items and just clear them directly. This is what's discussed in the issue summary, and as Berdir points out, it's still correct.

The opposite example is the entity list cache tags #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing → / 🐛 Use new cache tag ENTITY_TYPE_list:BUNDLE in Views to improve cache hit rate Needs work where a cache tag is both used on a lot of cache items, and also invalidated a lot, and there's an opportunity to make that more granular.

I also don't think we're getting a lot out of this issue, but the trade-offs are still there, and there are still issues trying to improve it.
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
> > Cache taqs have a cost
> This is the very claim that i challenge.

As @catch said each unique cache tag results in an extra query (if multiple per cache item, then grouped) every single time a cache item is fetched. It's not a claim, it's a fact.

And as @catch also said, nobody wants to remove cache tags entirely, it's about specific use cases and tags.

> What are they? What if the cost of avoiding cache tags is far beyond the cost of that cache tags? We can't balance pros and cons thoroughly without that answer.

That's exactly what this issue is about and the issues that I and @catch linked are specific examples for cases we are looking intoi, but you can not answer that generally.
Comment over 2 years ago →
🇩🇪Germany geek-merlin Freiburg, Germany
Thanks for elaborating. We're getting closer, so please understand and/or forgive my insistance...
> each unique cache tag results in an extra query (if multiple per cache item, then grouped) every single time a cache item is fetched.

This is the one that is still missing documented evidence.

If you are referring to \Drupal\Core\Cache\DatabaseCacheTagsChecksum::getTagInvalidationCounts, there we have one query for all tags, which was the big win of the ChecksumInvalidator.

Or are you referring to a different query? Please educate me.
(I'll soon be pushing some resources into this topic too, and what i'll do and maybe join this quest depends on these insights.)

PS:
> nobody wants to remove cache tags entirely
Who opened that bottle? Not me.
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
> there we have one query for all tags

For all tags *per cache get* call. We call that method in \Drupal\Core\Cache\DatabaseBackend::prepareItem() for example. It's not even optimized if you do a getMultiple() but that doesn't really happen for most cases where cache tags are used anyway. We don't know ahead of time which caches with which cache tags we're going to load on a certain request.

Quoting myself from 🐛 ChainedFastBackend invalidates all items when cache tags are invalidated Fixed :

> Testing the patch on default umami installation frontend as admin with disabled render caching, I'm seeing 6 extra queries to the cachetags table (142 vs 148 in total).

So there's your answer for that scenario (umami frontpage, admin, no render cache), the cost of cache tags is 142 extra database queries in HEAD for a single request. With enabled render caching, it's likely fewer, but depends on what kind of cache hits and misses you have.
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
FWIW, the number might be off because disabled caches cause a lot of cache writes that invalidate the static cache in there, but still, it can be a lot.
Comment over 2 years ago →
🇬🇧United Kingdom catch
For all tags *per cache get call*. We call that method in \Drupal\Core\Cache\DatabaseBackend::prepareItem() for example.

Yes this is the crux.

So if I'm getting the cache item for a render array, and it has six cache tags, then that's one database query - not a big deal and render caching is the primary use-case for cache tags.

But if I'm getting six different cache items for six different plugin managers, that all have their own cache tag that's not used anywhere else, that's six extra database queries.
Comment over 2 years ago →
🇩🇪Germany geek-merlin Freiburg, Germany
Thanks a lot @catch and @Berdir for elaborating this. So even if we have no hardcore profiling, and it is not one DB query per tag, it's "every additional tag has some likelyhood to trigger an additional DB query" or in other words the factor from query to DB queries may be small but is greater than zero.

Thinking about ideas like preloading, subqueries, mapper services, but will first think and elaborate.
Comment over 2 years ago →
🇬🇧United Kingdom catch
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
Working on 📌 Explicitly support Relay (drop-in replacement for PhpRedis) Fixed , noted in the example once again how many block config entity cache tags we have on a default installation. Similar to how we have group shortcuts into the set and only have cache tags for this, I think it might be worth looking into having a single block:$theme cache tag.

Block configs usually don't change that much, if they do it's often multiple (drag and drop reorder) and at the same time, for a use case with multiple themes, the list cache tag will invalidate all other themes (page and dynamic page cache) as well not just the one that's affected the block that you just saved.
Comment over 2 years ago →
🇬🇧United Kingdom catch
A single cache tag for block config seems like a good idea, should we open a specific issue for that one?
Comment over 2 years ago →
🇩🇪Germany geek-merlin Freiburg, Germany
As promised in #120, i have thought and worked on this.
I now see the reasons why imho removing cache tags from the API (rather than from the DB) is a breaking change and wrong, and we should rather do the latter.
Elaborates the reasoning in the IS.
Comment over 2 years ago →
🇩🇪Germany geek-merlin Freiburg, Germany
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
I agree with a), removing cache tags is indeed a problem and 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed has one example where a clear call needs to change. It will require a release manager decision whether that change is OK or if we need to find some sort of BC.

On b) plugin discovery/managers have and always had an invalidation API that should be used instead of the underlying cache tags. I commented on 📌 Allow plugin derivers to specify cache tags for their definitions Postponed: needs info that I disagree with that approach.

I also disagree on c) and the issue you created, that is not the direction we want to go, we want fewer cache tags, not more.
Comment over 2 years ago →
🇫🇷France andypost
Another edge case is help topics (each topic is plugin) and its cache needs precise clean on code deploy to update search index for new topics

[meta] Profile/rationalise cache tags

Problem/Motivation

Downsides of (1) removing cache tags

Proposed resolution

Remaining tasks

User interface changes

API changes

Original report by catch

Original report by catch

Comments & Activities