Allow plugin derivers to specify cache tags for their definitions

Created on 20 September 2018, over 6 years ago
Updated 15 June 2024, 7 months ago

Problem/Motivation

Plugin derivers are essentially a foreach loop for providing plugin definitions.
Typically they iterate over the result of some other service, creating a definition for each item.
If that list of items changes, the plugin definition cache must be cleared so that the deriver can accurately reflect the up-to-date list.

In practice, this has meant that another piece of code, like a hook implementation, will manually clear the plugin definition cache.

For example, \Drupal\system\Plugin\Derivative\SystemMenuBlock loops over all menu entities to provide menu blocks.
\Drupal\system\Entity\Menu::save() and \Drupal\system\Entity\Menu::delete() are then responsible for clearing the block plugin definition cache.

Proposed resolution

Plugin managers already support the use of cache tags for their definitions cache.
Allow the derivers themselves to affect the list of cache tags used but NOT cache contexts or max-age, see @Berdir's comment at #74 📌 Allow plugin derivers to specify cache tags for their definitions Postponed: needs info .
This will remove the need for external code (like a hook) to clear the cache.

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Postponed: needs info

Version

11.0 🔥

Component
Plugin 

Last updated about 15 hours ago

Created by

🇺🇸United States tim.plunkett Philadelphia

Live updates comments and jobs are added and updated live.
  • Blocks-Layouts

    Blocks and Layouts Initiative. See the #2811175 Add layouts to Drupal issue.

  • Needs reroll

    The patch will have to be re-rolled with new suggestions/changes described in the comments in the issue.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • 🇫🇷France nicodh

    Re-rolled for core 9.5.4.
    I also add implementation of CachedDiscoveryInterface and RefinableCacheableDependencyInterface for MenuLinkManager, because I've experienced trouble with menu derivatives.
    I think it might be MenuTreeStorage responsibility to propagate plugins' cache tags instead of this solution, but I've no really advised opinion on this and no time to explore this for now...

  • 🇨🇭Switzerland berdir Switzerland

    For the record, we're moving in the opposite direction, to have as few cache tags in plugin definitions as possible. See 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed and https://www.drupal.org/node/3344524 for the reason. Cache tags have a cost, plugin definitions usually only have few or just a single cache key and while it requires a bit of extra code, it's far more efficient to invalidate them in the rare cases when they do change as opposed to checking if the plugin definitions are still valid on every single cache get.

    -1 from me on supporting this.

  • Status changed to Postponed: needs info over 1 year ago
  • 🇨🇭Switzerland berdir Switzerland

    The issue mentioned above has been committed now, as has the changes to the chained fast backend that will from now on cause it to validate cache tags in the discovery bin on every request.

    Several of the affected cases here are in consideration to be refactored to not use derivates anymore. Personally, I've always considered those to be an anti-pattern, while there are use case that are valid as there are sometimes really different block definitions (views for example, with different argument contexts), things like block_content, layout builder, menu shouldn't be separate block plugin derivates, but a single one with configuration to select the block or menu that should be displayed. See 🐛 block_content block derivatives do not scale to thousands of block_content entities Needs work for scalability issues that the current architecture causes. Also Add the notion of a 'configured layout builder block' to solve a number of content-editor and performance pain points Active for layout builder.

    And with those changes, those would no longer be plugins and the need for invalidation of the block plugin cache would go away, instead there will either be a runtime check in the block plugin or we'll query block config instances and delete those.

    The workarounds that are needed if this is not present are a cache clear call through an existing API, which is a single line of code, that's typically needed in 2-3 places at most.

    I discussed a bit more with @catch in slack, he also added that this kind of encourages these "bad" patterns.

    Setting to postponed (needs more info) for now.

  • 🇫🇷France andypost

    Mentioned the same in 📌 Add a @CacheableMetadata annotation Active

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Re #70:

    Several of the affected cases here are in consideration to be refactored to not use derivates anymore. Personally, I've always considered those to be an anti-pattern, while there are use case that are valid

    Another example is Group, plugins like group_node:CONTENT_TYPE need to use derivatives because the plugin ID is saved to the DB so that query access can be applied per content type by virtue of some complicated joins that also check for the plugin ID. Using a config form entry to select what content type to serve would make this impossible. There's more reasons why Group needs derivatives, but I won't bore you with the details.

    Adding this to make it clear we shouldn't be moving away from derivatives up to a point where they are no longer an option as that would put Group (and other modules perhaps) in a tough spot.

    Having said that, I never used cache tags to clear these derivatives in the first place. I resorted (and can keep resorting) to:

    /**
     * Implements hook_ENTITY_TYPE_insert().
     */
    function gnode_node_type_insert(NodeTypeInterface $node_type) {
      \Drupal::service('group_relation_type.manager')->clearCachedDefinitions();
    }
    

    Right?

  • 🇬🇧United Kingdom catch

    @kristiaanvandeneynde yes that will continue to work fine. We definitely can't deprecate using derivatives, we just need to stop using them in some specific cases. I would hope there are a lot less group types than fields or custom blocks.

  • 🇨🇭Switzerland berdir Switzerland

    Exactly, the sole purpose of this issue is to replace clearCachedDefinitions() calls with cache tags. when you can invalidate cache tags you can always also call that API method. For an IMHO marginally better DX. It's a task, not a bugfix.

    That said, the issue title also mentions contexts and age, which this issue does not yet implement and I also struggle to see use cases for that. Plugins that exist or don't exist or are different in some cases seem like a very bad idea.

  • 🇦🇺Australia nigelcunningham Geelong

    The patch also needs more work. It removes BlockContent::invalidateBlockPluginCache without modifying the invocations of that method.

  • 🇦🇺Australia nigelcunningham Geelong

    Here's an updated version of #67 that addresses the BlockContent issue I mentioned in the previous comment.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    @Berdir in #74: good point. Re-scoped. (The scope was expanded in #56.)

    But reading your #68, @kristiaanvandeneynde's #72 and @catch's #73, I'm wondering if we should instead close this issue? 🤔 🐛 Reduce the number of field blocks created for entities (possibly to zero) Fixed already landed, and 🐛 block_content block derivatives do not scale to thousands of block_content entities Needs work and Add the notion of a 'configured layout builder block' to solve a number of content-editor and performance pain points Active would further reduce the need for this, for the one use case (Layout Builder) where this is a significant performance issue?

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    I wouldn't necessarily read my comment as "nope, we don't need this".

    I'm trying to get rid of caches we need to manually clear in my modules in favor of those that support cacheable metadata (mainly cache tags). For the Group example I made, if I were to tag my group_node deriver with node_type_list, I wouldn't have to manually clear my definitions anymore. So on that front it is useful.

    FWIW I'd actually like to see most of these manual cache clears go away in core too as it's asking for trouble when some outside code inevitably changes the data without knowing that they should manually clear a cache somewhere. There will probably be a few cases where we can't use tags (such as the internal property of MemoryCache), but any "cache property" on a class that can be replaced by a MemoryCache, should be IMO. Or at least where it makes sense.

  • 🇨🇭Switzerland berdir Switzerland

    > But reading your #68, @kristiaanvandeneynde's #72 and @catch's #73, I'm wondering if we should instead close this issue?

    Yes, that is my suggestion. There is an API to clear caches, if you can invalidate a tag you can also invalidate the plugin cache. Per #68, we specifically worked on removing as many cache tags as possible in plugin definitions because every plugin definition with cache tags results in runtime costs on every single request (if the cache tag isn't used anywhere else as well).

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    There is an API to clear caches, if you can invalidate a tag you can also invalidate the plugin cache

    This is implying we always manually clear cache tags, though. When saving an entity this is far from the case as they have an individual cache tag, the default list cache tags and a potentially high amount of custom (list) cache tags.

    To repeat the Group example, I'd need to implement group_node_type_(insert/update/delete) to clear a cache where, if we were able to add the node_type_list cache tag on the discovered collection, I would have to do nothing at all. But in that case, I wonder if we'd want to add said list cache tag in the deriver or the plugin definition or base plugin class instead.

    Regardless of whether we close this issue, a big downside is that if I were to have, say, group_node, group_menu, group_whatever, I'd have to constantly clear the plugin definition cache whenever one of these groupable entities is saved. Which is insane, yet I see no way around this as we don't have "subcollections", only one big pile of all definitions.

    Per #68, we specifically worked on removing as many cache tags as possible in plugin definitions

    Do you have an issue link on this so I can read up? Might change my mind when I see the discussion surrounding that.

    TL;DR: Derivers can lead to poor cacheability either way (see Group example), so I'm on the fence.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Come to think of it, if we were to split up the cache for a plugin manager to the following:

    • One entry for the 'regular" definitions, tagged with probably nothing
    • One entry for each deriver's definitions, tagged with the list tag of what it depended on
    • One aggregate entry containing all definitions, tagged with all the things

    Then it would work a little more like the render cache.

    Say you have a deriver that depends on node types, one that depends on menus and one that depends on vocabularies. Then, with the above, if a new vocabulary was cleared, we'd still get cache hits for the common plugins, the node type ones and the menu ones. So the aggregate would return a cache miss, we'd find all but the taxonomy base definitions as cache hits and be able to quickly store a new aggregate.

    With this system, it would not be so bad to add cache tags to the derivers as they wouldn't flush the whole list of plugin definitions, but rather a subset.

  • 🇬🇧United Kingdom joachim

    > There is an API to clear caches, if you can invalidate a tag you can also invalidate the plugin cache.

    Yes, but isn't the point of this issue that module B can define some derivative plugins that depend on module A's things, and module A doesn't know about it.

    So when you change module A's data, module A doesn't know that the plugins need to be updated. What it does do is invalidate its cache tags.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    #82++

    👀

  • 🇨🇭Switzerland berdir Switzerland

    > So when you change module A's data, module A doesn't know that the plugins need to be updated. What it does do is invalidate its cache tags.

    Do you have a specific example for this?

    If module B defines something based on data from module A, then it can listen to those changing and invalidate the plugin definitions. almost any derivatives that I've seen that can change are entities, and those have standardized hooks.

  • 🇺🇸United States tim.plunkett Philadelphia

    I don't remember which hooks we had or if they were "standardized" yet, but 6 years ago when I opened this, I said:

    Allow the derivers themselves to affect the list of cache tags used.
    This will remove the need for external code (like a hook) to clear the cache.

    So I don't know that this has any practical application anymore if we're okay with needing hooks forever

  • 🇨🇭Switzerland berdir Switzerland

    It's not my decision to make, but my *opinion* is that a slightly improved DX (see changes in block_content as a good example) is not worth the performance cost that this would definitely cause.

    The block_content changes also shows a considerable performance regression because I suspect this patch predates non-reusable blocks and with cache tags, we currently don't have a way to only invalidate on reusable blocks (maybe non-reusable blocks shouldn't invalidate the list cache tag? but that's a separate issue)

  • 🇬🇧United Kingdom joachim

    > If module B defines something based on data from module A, then it can listen to those changing and invalidate the plugin definitions. almost any derivatives that I've seen that can change are entities, and those have standardized hooks.

    We can do that.

    But I thought the benefit of cache tags was that instead of listening to changes and then clearing caches, we more simply declare what things our cache depends on, and then invalidation happens automatically.

  • 🇳🇿New Zealand john pitcairn

    ^^ This. Being able to simply declare what entity list tags you depend on is a lot more transparent than having to figure out which ugly procedural hooks you need to implement (update and insert, what?), and what you have to do in those to then invalidate the entire plugin cache.

    The DX improvement is more than minor IMO.

  • 🇨🇭Switzerland berdir Switzerland

    > and what you have to do in those to then invalidate the entire plugin cache.

    You always invalidate the entire plugin cache. #81 is not what is implemented here and I have no idea how that would work. Plugins in many cases require us to know all the plugins of a given type, as you often pick them from a list of all the plugins. Yes, sometimes you just load a specific plugin but the discovery API would require major IMHO BC breaking changes to support anything like that.

    Cache tags are designed to be fast to invalidate an unlimited amount of cache entries, with the downside of having a runtime cost to check if cache tags are still valid on every single cache get. Plugin caches are in almost all cases a single or a finite amount of cache entries (e.g. one per language). That is not the use case that cache tags were designed to solve.

    This didn't actually do test runs in a long time, a MR would show those problems in the new performance tests.

    If you're concerned about the amount of derivative plugin definitions and frequent invalidations of them then you very likely should not be using derivatives but block configuration or so. See #70.

    And as mentioned in #86, the the patch currently would invalidate block plugin definitions every single time that non-reusable blocks in e.g. layout builder would be edited, another performance regression.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Cache tags are designed to be fast to invalidate an unlimited amount of cache entries, with the downside of having a runtime cost to check if cache tags are still valid on every single cache get. Plugin caches are in almost all cases a single or a finite amount of cache entries (e.g. one per language). That is not the use case that cache tags were designed to solve.

    This part I understand and can appreciate. Looking at the plugin managers in core, we usually set only one cache or one cache per language, so at best you have as many cache entries as there are managers and at worst you multiply that by the available languages.

    However, we have almost 50 plugin managers in core (see screenshot for part of the list). Out of those, 3 (entity_types, local_task, breakpoint) already set cache tags on the cache entry, 4 vary by language, and then there's Migrate and Views also multiplying their cache entries by plugin type. The ones that already set cache tags

    At one point you've got to question the DX behind having to manually clear one out of somewhere between 70 and 100 cache entries, as opposed to not having to do anything because cache tags take care of things for you.

    Furthermore, the cache tags we'd probably be setting are (bundle) list cache tags. While not guaranteed, it's not unthinkable that many pages will contain at least one element that is also tagged with said cache tags and, as both you and the docs mention, that does not incur an extra performance hit.

    See this part of the default plugin manager:

     * @param array $cache_tags
       *   (optional) When providing a list of cache tags, the cached plugin
       *   definitions are tagged with the provided cache tags. These cache tags can
       *   then be used to clear the corresponding cached plugin definitions. Note
       *   that this should be used with care! For clearing all cached plugin
       *   definitions of a plugin manager, call that plugin manager's
       *   clearCachedDefinitions() method. Only use cache tags when cached plugin
       *   definitions should be cleared along with other, related cache entries.
       */
      public function setCacheBackend(CacheBackendInterface $cache_backend, $cache_key, array $cache_tags = []) {
    

    So I'd say we don't close this just yet but rather wait for someone to pour the patch into an MR and see what performance tests have to say?

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Re your mention of #70 I'm still not sure where to go if derivers are considered a "bad" pattern. In Group, you have plugins for every node type, menu, taxonomy, etc. How would we achieve that if not for derivers?

  • 🇬🇧United Kingdom joachim

    If the problem with cache tags is performance and definitions, what about changing how a particular type's plugins are cached?

    We currently cache plugin definitions as a single cache entry in the 'discovery' cache bin. That works fine for a relatively small set of plugins. But for a larger set, we could define a cache bin specifically for that plugin type, and store one cache entry per plugin definition and then each one gets its own invalidation.

  • 🇨🇭Switzerland berdir Switzerland

    > Furthermore, the cache tags we'd probably be setting are (bundle) list cache tags.

    You might, but the primary use case that's implement in this issue is for block_content content entities and their list tag.

    > While not guaranteed, it's not unthinkable that many pages will contain at least one element that is also tagged with said cache tags and, as both you and the docs mention, that does not incur an extra performance hit.

    I assume you mean the list cache tag of the bundle entity type (e.g. node_type_list) in this case and not the per-bundle list cache tags of the main entity type (e.g. node_list:article). I would argue that those are very, very rarely used on pages, because page content usually doesn't need to react on having new/removed bundles? rendered entities rely on field and formatter configuration, but those are different cache tags. And even if a cache tag is used on some/many pages, it won't be used on many other things like API, Drush, .. where your plugins will be loaded and used as well.

    > At one point you've got to question the DX behind having to manually clear one out of somewhere between 70 and 100 cache entries, as opposed to not having to do anything because cache tags take care of things for you.

    I don't get the argument here. You need to clear the specific plugin type(s) that you provide derivates for. It is not relevant if there are 10, 70 or 500 others? I would assume that cases where you expose something to more than 1/2/3 different plugin types are very rare?

    > Out of those, 3 (entity_types, local_task, breakpoint) already set cache tags on the cache entry

    Yes, that's what's left after 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed , it was quite a few more before that. The remaining ones are special, local task for example caches things based on the current page, there are many cache ids and we need a cache tag, the other have similar reasons, as does the library cache for example.

    > Re your mention of #70 I'm still not sure where to go if derivers are considered a "bad" pattern. In Group, you have plugins for every node type, menu, taxonomy, etc. How would we achieve that if not for derivers?

    I wouldn't. I think your use case is valid and and most sites have a manage amount of these things that rarely change (menu might be a bit tricky, we do have a site with 250 menus, already starts to get tricky there, but there are far worse performance issues there about so many menus and menu links).

    I'm talking about cases like the two issues mentioned in #77. There are sites with hundreds and thousands of reusable block content entities, that causes havoc for the block plugin definitions. (And tens of thousands of non-reusable ones).

    I'm not suggesting to remove derivers completely, just that you should consider if derivers are the right tool for your use case, because they do not scale and it's often not necessary. per-menu block derivates in core for example IMHO could easily be replaced with a configuration element where you select the menu, I don't think UX would be worse than now.

    > We currently cache plugin definitions as a single cache entry in the 'discovery' cache bin. That works fine for a relatively small set of plugins. But for a larger set, we could define a cache bin specifically for that plugin type, and store one cache entry per plugin definition and then each one gets its own invalidation.

    We can't discover single items, so invalidating single items also can't work. And it would result in even more cache tags, because then we'd definitely need cache tags for these entries. As mentioned, an existing pattern for this would be views data, where we have a single global cache and per-table caches because it's a pretty safe assumption that most pages will only need one/a few tables of views data. But invalidation still invalidates everything.

    The only feasible solution for large sets with the current plugin architecture is to avoid having large set in the first place.

    But: this issue is not about large sets, and it's not about removing the ability to have derivatives. Those issues are just related/being mentioned because this issue changes a few specific implementations that shouldn't use derivates in the first place. But that's not the reason why I'm against doing this. The reason for that is 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed and the related changes around fast chained cache backends.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Re #93: Thanks for clarifying. For a second I was under the impression that all derivers were considered "bad". I fully agree on blocks being too giddy on the things they create derivatives for (such as your menu block example).

    The only feasible solution for large sets with the current plugin architecture is to avoid having large set in the first place.

    Yeah, hence my comment in #81. A bit besides the question here, but something that could solve the "all plugins are thrown into one big pile" issue.

    I've read up on 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed and I can see why we did that. We were adding arbitrary cache tags that nothing else uses and probably never would use. So I'm 100% on board with these cache tags being removed there. What I'm looking for here is for a deriver to say: "I'm gonna spit out a bunch of derivatives based on node types, so the collection should be cached with the node type list tag".

    We don't need to use that for blocks, or any other place where using derivatives is already questionable. But for those places where derivers do make sense, such as in Group, it would be a nice QOL improvement to be able to specify (list) cache tags. Even though that may end up flushing the full list of plugin definitions whenever a new node type, menu, etc. is introduced. But that we can tackle in a separate issue which focuses more on what I said in #81.

    Would the above paragraph be considered on topic for this issue? Because if it is, then maybe we can narrow down the MR to facilitating just that.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Re:

    I don't get the argument here. You need to clear the specific plugin type(s) that you provide derivates for. It is not relevant if there are 10, 70 or 500 others

    Yeah, but I'd rather just add node_type_list as a cache tag than have to manually call a cache clear in node_type_insert, node_type_update, node_type_delete and repeat that process for any other deriver that relies on menus, taxonomies, etc. I'd end up with a dozen hook implementations where a few simple cache tags could have achieved the same result.

  • 🇬🇧United Kingdom catch

    If/when we do 🌱 [META] Hooks via attributes on service methods (hux style) Active then node_type_insert, node_type_update, node_type_delete could all be implemented in a single method with three attributes (or one attribute with three hook names) - assuming they do the same thing.

    If bundles is the main use case here, I do wonder whether a single 'bundle_list' cache tag that covers every bundle on every entity type is worth looking at, then we wouldn't get cache tag checksum overload if it's re-used. Would be a bit less granular but bundles being acted on is fairly unusual.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    If bundles is the main use case here, I do wonder whether a single 'bundle_list' cache tag that covers every bundle on every entity type is worth looking at, then we wouldn't get cache tag checksum overload if it's re-used. Would be a bit less granular but bundles being acted on is fairly unusual.

    That would seem reasonable to me. Can then add said cache tag in my plugin manager and document it as such:
    "Many plugins that allow you to put a specific entity type into a group will often have a derivative for each bundle of said entity type. As such, we invalidate the plugin definitions cache whenever a bundle is updated."

    The only major downside being that if you're just using GroupNode, you'd also flush the cache whenever you edit a taxonomy, group type, etc. But you're right: How often does that happen?

    Somehow this feels like asking "what's the worst that could happen?", to which the answer is often not what you expected :P

  • 🇨🇭Switzerland berdir Switzerland

    We already have a entity_bundles cache tag, invalidated in \Drupal\Core\Entity\EntityTypeBundleInfo::clearCachedBundles(). So if you want to use that in your plugin manager, you could. But not everything you list are bundles (menus are not) and things are bundles that you might be aware of (webforms for example, which is I think the major reason why 🐛 Reduce the number of field blocks created for entities (possibly to zero) Fixed caused such issues for some sites, as having dozens/hundreds of webforms isn't unheard of, compared to other bundles types).

    Ther's also hook_entity_bundle_create and delete, but not update. But there is hook_entity_update/insert/delete, where you can check the entity type bundle_of information, and you could then only invalidate if that's actually an integrated entity type, if you have that information.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Okay, so maybe a bundle cache tag isn't great. Which leads us back to #94: Could we allow a deriver to set a cache tag, but not necessarily implement that for things like blocks. At least, that way, Group and other modules could use it.

    These cache tags wouldn't be unique, like the ones from 📌 Manually clear cache keys from plugin managers with finite variations instead of using cache tags Fixed were. If you clear node_type_list, the /node/add page among other things also need rebuilding. The only downside is that you'd now check for the node_type_list tag in the database on pages where you previously might not have.

    To me it feels like better DX vs one extra query to the cache tags table. or am I simplifying this too much?

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    I mean:

    • I know my definitions depend on node types.
    • I know that when the list of node types changes, so do my plugin definitions.
    • I know there's a cache tag that I can use for this.

    To me, the DX screams: Use the cache tag!

    Now I understand that not using a cache tag means one query fewer, provided I manually clear the cache on certain events. But isn't this why we have cache tags in the first place? To make it so we don't have to worry about manually clearing caches.

    I also get the nuance that we can't predict all the CIDs like we can here and that that's where cache tags truly shine. For instance: it's impossible to know the CIDs of all the views that might depend on the list of nodes without running some extra, likely expensive, code.

    I don't know, I just feel like this is a step backwards in terms of DX. Hux-type hooks, while awesome, won't help here either because the manager does not know beforehand which entity types' hooks it needs to subscribe to. I can't predict the entity types people might want to write a Group plugin for.

  • 🇨🇭Switzerland berdir Switzerland

    Now I understand that not using a cache tag means one query fewer, provided I manually clear the cache on certain events. But isn't this why we have cache tags in the first place? To make it so we don't have to worry about manually clearing caches.

    Not sure what else I can say that I didn't already say here.

    Yes, cache tags are great, but they come with a cost, and that needs to be considered. Not with every basic cache tag you add to a render array, that usually doesn't result in a measurable difference (also because there often are already unique cache tags on them anyway), but when deciding whether or not we should add such a fundamental API? Definitely. Plugin caches are usually using the fast chained backend, so that means no extra DB queries apart from the per-bin timestamp check, so every plugin that that ends up having at least one cache tag means we suddenly have database queries where before we didn't.

    *If* we have this API in place, then specific contrib/custom deriver implementations are in most cases not going to carefully consider the implications, they'll that this is supported and use it.So it's on us, in this issue, to decide about supporting that.

    That's why we have 📌 [meta] Profile/rationalise cache tags Needs work open for a decade now, to consider questions like this (although with the performance testing that we have in place now, we might actually be able to just close that?)

    If you clear node_type_list, the /node/add page among other things also need rebuilding. The only downside is that you'd now check for the node_type_list tag in the database on pages where you previously might not have.

    Yes, but that's exactly the point? Cache tags on a specific page (and likely many specific pages then) is the only relevant metric. Cache tag retrieval is only a static cache, so how many unique cache tags you have on a specific page is exactly what matters. Loading the checksum for node_type_list on /node/add is perfectly fine, doing so on basically every single page is a different story.

    That's also why the long list of plugin caches you posted in #90 is for me an argument against doing this. That just means more possible cases that could result in additional, unexpected cache tag lookups.

    Hux-type hooks, while awesome, won't help here either because the manager does not know beforehand which entity types' hooks it needs to subscribe to. I can't predict the entity types people might want to write a Group plugin for.

    I think hux/event style things were mentioned because you could have a single method to be notified on insert/update/delete events, likely for multiple specific entity types too, while currently you'd need several functions and either per-entity-type hooks or all with runtime checks.

  • 🇬🇧United Kingdom joachim

    A couple of thoughts:

    1. Part of the problem here is that we treat both plugins and blocks as hammers to use on everything. Plugins are designed to be swappable pieces of functionality, but they end up being used for anything we need a 'thing' there could be more than one of. Similarly blocks - we've extended them with content blocks, and then re-purposed them to work inside layouts, which means that instead of having maybe 20-50 blocks on a big site, we have hundreds because now there are blocks that are only used on one page.

    We've pushed both of these systems beyond what they were designed to do.

    2. Cache invalidation is weird and not well documented. But basically, the cost is that every cache read needs to check the tags to see if anything has been invalidated. I've always wondered why it's this way round rather than what I would naively have done, which is to delete cache entries when the invalidating change is made. In this particular case, would that kind of cache cleaning strategy work bettter? Could it be made swappable, or is tag invalidation baked too deeply into the cache system?

  • 🇬🇧United Kingdom catch

    @joachim

    Cache invalidation is weird and not well documented. But basically, the cost is that every cache read needs to check the tags to see if anything has been invalidated. I've always wondered why it's this way round rather than what I would naively have done, which is to delete cache entries when the invalidating change is made. In this particular case, would that kind of cache cleaning strategy work bettter? Could it be made swappable, or is tag invalidation baked too deeply into the cache system?

    It's entirely swappable, but trying to directly delete cache items is almost never the correct implementation, and it's why so many other cache implementations (including Symfony's and Laravel's) outside Drupal fail to implement cache tags, or when they do, they do so in a way that doesn't scale to the kind of situations Drupal's implementation allows for (like hundreds of thousands of cached rendered entities with their own cache tags).

    1. When caching in a relational database, it's possible to delete cache items on tag invalidation, but we would need to run a DELETE query, and deleting a lot of rows is extremely expensive and slow. Doing this on cache_render is a non-starter.

    2. Core's other main cache backend is APCu, and it's not possible to introspect all APCu entries to check if they have cache tags. In the chained fast case we just blow away the 'fast' cache whenever there's an invalidation/write anyway because it's backed by the persistent backend, that's why there are no cache tags checks on chained fast hits. But e.g. memcache has the same issue, so the checksum approach is necessary for a lot of contrib cache backends too.

    So it would be possible to write a no-checksum cache tag supporting database backend, and then change the discovery cache bin(s) to use that database backend. But as soon as that cache tag is used in a render array or anything that's stored in a cache bin using the checksum backend, we'd still have to query it anyway, and when it gets invalidated, it'd need to be both deleted from one type of cache backend, and incremented in the checksum implementation, so now two things have to happen on invalidation instead of one.

    There's a good high-level write up here https://mglaman.dev/blog/drupal-cache-tags-all-regardless-your-backend

    I think hux/event style things were mentioned because you could have a single method to be notified on insert/update/delete events, likely for multiple specific entity types too, while currently you'd need several functions and either per-entity-type hooks or all with runtime checks.

    Yes exactly.

  • 🇬🇧United Kingdom joachim

    > 1. When caching in a relational database, it's possible to delete cache items on tag invalidation, but we would need to run a DELETE query, and deleting a lot of rows is extremely expensive and slow. Doing this on cache_render is a non-starter.

    In our case, the cache invalidation happens when a user edits a config entity. It's an admin task and it's fine for that to take some time.

    So this ties in to another feature I've been thinking about -- deleting dependent entities or cleaning up dangling references. There are various contrib modules that specifically do this, and they all hit the problem of scaling -- when you save an entity form, you can't go off and delete or update 1000s of entities during the save hook.

    Core needs a way to be able to say 'Hey this create/update/delete API call you're making requires some follow-on tasks, we're taking you to a Batch API page while we run them'. I don't know how we'd do that, as batch API is tied closely to forms, whereas we'd want something at the API level.

  • 🇬🇧United Kingdom catch

    In our case, the cache invalidation happens when a user edits a config entity. It's an admin task and it's fine for that to take some time.

    It's not necessarily an issue that it's slow for the person triggering the cache invalidation, it's that slow queries can lock up the database in general when there's lots of contention.

    Let's say a DELETE query on the cache_render table takes 2s. During those two seconds, requests are coming in, and they're trying to write cache items to it as well - either unrelated ones, or ones that were just invalidated within the two seconds. Then on top of this, someone saves a node which invalidates the node:list cache tag and this triggers another 2s delete query on the cache_render table, but is blocked on the first query due to row locking.

    It's the sort of thing that can cause site outages. Exactly how bad it is depends on what the indexes are like, but given we'd have to store all the tags in one column, it would require a LIKE query so it's not going to be perfectly indexed whatever we do. This is the kind of issue we avoid even having to think about by using the checksums.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Right, I've spent some time reviewing the last few comments and I fully understand where the hesitation is coming from.

    However, it seems like we're shying away from implementing a QOL feature due to a drawback from another feature (cache tag invalidation count lookups). So let me counter this by saying: "What if we made that drawback go away instead?"

    The current implementation is that for every cache item we check its cache tags and, for those that we did not retrieve the invalidation count for yet, we fetch the invalidation counts from the DB. This leads to multiple of the same queries to the DB and this query count would increase if we allow derivers to add cache tags to the discovery.

    Counterpoint: What if we build a simple system that allows us to either specify or identify commonly used cache tags and, at the very first call to ::calculateChecksum(), we load ALL of them with one query. This would both improve the performance of Drupal as it stands and make the current issue less of a hassle because we could (but don't have to) add whatever cache tag our deriver specifies to the "common" list.

    I feel like this wouldn't necessarily have to be a smart system that tries to automagically deduce which tags belong to the list, but could rather be a very rudimentary system where both core and contrib can specify tags to add to the list. We could then even have the entity type manager make sure that all entity type's list tags are added to said list, for instance, but that's up for discussion.

    This would be a simple alternative to 📌 Add a CacheTagsAggregator Needs work that's actually quite easy to implement and test with our new performance tests.

  • 🇬🇧United Kingdom catch

    Counterpoint: What if we build a simple system that allows us to either specify or identify commonly used cache tags and, at the very first call to ::calculateChecksum(), we load ALL of them with one query. This would both improve the performance of Drupal as it stands and make the current issue less of a hassle because we could (but don't have to) add whatever cache tag our deriver specifies to the "common" list.

    I feel like this wouldn't necessarily have to be a smart system that tries to automagically deduce which tags belong to the list, but could rather be a very rudimentary system where both core and contrib can specify tags to add to the list. We could then even have the entity type manager make sure that all entity type's list tags are added to said list, for instance, but that's up for discussion.

    This seems worth a go, it would be a bit like a (simpler) version of the path alias multiple load/cache - generate the list of things that we think might be loaded, and then load the actual things in one query.

  • 🇭🇺Hungary Balu Ertl Budapest 🇪🇺
  • 🇭🇺Hungary Balu Ertl Budapest 🇪🇺
Production build 0.71.5 2024