[policy] Standardize how we implement in-memory caches

Created on 10 April 2019, over 5 years ago
Updated 24 May 2023, over 1 year ago

Problem

In ๐ŸŒฑ [META] Remove all usages of drupal_static() & drupal_static_reset() Active there are lots of different ways of using in-memory caching, mostly via a protected property on the service class.

However, this raises issues such as how to flush the cache, which is mostly used in tests.

A common pattern emerging is to add a flushCaches() method on the interface. However,

  1. This is exposing the underlying implementation detail on the interface.
  2. Storing 'state' on stateless services should be discouraged.

Proposed Solution

One suggestion is to use the memory backend cache, and inject it into these services.

This removes the need to expose a flushCaches() method. We can also flush caches in tests, or inject a null cache backend.

๐ŸŒฑ Plan
Status

Needs work

Version

11.0 ๐Ÿ”ฅ

Component
Baseย  โ†’

Last updated 5 minutes ago

Created by

๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand jweowu

    OK, so the general guidelines I'm hearing are:

    1. If there is a large amount of data (or there are memory issues)
    2. If you need a way to reset the cache (outside of tests)...

    ...use a Memory cache backend.

    Else...

    use properties on the service class.

    Regarding (1), I don't think the amount of data should ever make a difference to the accessibility of the data.

    Regarding (2), we cannot predict whether or not contrib or custom code might have a reason to inspect or manipulate or reset any given cache, so we shouldn't try -- it should simply be possible, as standard, and in a standard way.

    I am therefore against the use of cache properties and/or plain static variables. A key benefit of drupal_static() was having a standard API which enabled other code to interact with these caches as necessary. Regressing to an approach which hides that data away is a step backwards, as is the removal of a standard approach to working with such caches generally.

    drupal_static() was simple, functional, and extremely standard across core and contrib code. I think that it should be replaced with something fitting that same description.

  • Status changed to Needs review over 1 year ago
  • last update over 1 year ago
    Custom Commands Failed
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia Ratan Priya

    Re-rolled patch #27 for 11.x-dev.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    @jweowu yeah I don't think it's about the amount of data, but it should only be used for actual caches that will definitely get refreshed automatically.

  • Status changed to Needs work over 1 year ago
  • The Needs Review Queue Bot โ†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide โ†’ to find step-by-step guides for working with issues.

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    If the main concern is the reset problem, we can keep the object properties.
    All we need is a mechanism for the reset, that does not pollute the public interface.

    The service with the runtime cache property subscribes to reset events, like this:
    https://git.drupalcode.org/issue/drupal-3368812/-/commit/27b046020ac6f91...
    (link might die)

      public function __construct(
        [..]
        private readonly ActiveModuleListInterface $activeModuleList,
        ResetDispatcherInterface $resetDispatcher,
      ) {
        // @todo Avoid double subscribe.
        $activeModuleList->subscribe($this->reset(...));
        $resetDispatcher->subscribe(ModuleHandler::RESET_HOOKS, $this->reset(...));
        $resetDispatcher->subscribe(ModuleHandler::EVENT_WRITE_CACHE, $this->writeCache(...));
        [..]
      }
      public function reset(): void {
        $this->hookInfo = NULL;
        [..]
      }
    

    There are different ways to implement this, which can coexist:

    • A mutable data sources (e.g. a module list with ->addModule()) can have its own ->subscribe() method, and its own dispatch mechanism.
    • A central dispatcher for these kinds of mini runtime events can keep a list of subscriber callbacks by id.

    These cache properties are special, because they only need reset events after the object has been created.
    E.g. symfony event dispatcher needs to "wake up" all the subscribed services. The runtime dispatcher only needs to message those services that have subscribed at runtime. This is why it is much easier to create and work with.

    The reset dispatcher does not need to send any data, it only has to "ping" the subscriber, which already has the data source as a dependency.

    -

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium kristiaanvandeneynde Antwerp, Belgium

    Following this issue, but my 2c regarding the factory:

    • We have one for most other types of caches
    • The documentation states we should use them (see below)
    • MemoryCache is significantly faster than MemoryBackend, yet only the latter has a factory

    Sidenotes:

    • I realize why MemoryBackend (un)serializes data, but if we don't care about that then we will always prefer the faster sibling.
    • If we don't add the cache.bin tag, the cache tags invalidator doesn't apply (unless we then wrap our cache in a chain)

    From the in-code documentation:

    /**
     * ...
     * A module can define a cache bin by defining a service in its
     * modulename.services.yml file as follows (substituting the desired name for
     * "nameofbin"):
     * @code
     * cache.nameofbin:
     *   class: Drupal\Core\Cache\CacheBackendInterface
     *   tags:
     *     - { name: cache.bin }
     *   factory: ['@cache_factory', 'get']
     *   arguments: [nameofbin]
     * @endcode
     * See the @link container Services topic @endlink for more on defining
     * services.
     * ...
     */
    
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia larowlan ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ.au GMT+10

    Looks like there's patches/MRs here so removing the [policy] prefix from the title.

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    Re myself in #40:

    If the main concern is the reset problem, we can keep the object properties.
    All we need is a mechanism for the reset, that does not pollute the public interface.

    We don't need to invent a new event mechanism. We can use the regular Drupal/symfony event dispatcher we use elsewhere, just use ->addListener() instead of the usual subscription mechanism.
    We only need to subscribe a service that is already instantiated.

      public function __construct(
        EventDispatcherInterface $dispatcher,
      ) {
        $dispatcher->addListener(self::EVENT_ID, $this->reset(...));
      }
    
  • ๐Ÿ‡ท๐Ÿ‡บRussia Chi

    All we need is a mechanism for the reset, that does not pollute the public interface.

    If we were using the Symfony core, we would already have such a mechanism.

    Anyway, I suppose it's not a big deal to enable this service pass in Drupal kernel.
    \Symfony\Component\HttpKernel\DependencyInjection\ResettableServicePass

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    If we were using the Symfony core, we would already have such a mechanism.

    We want to avoid that a service gets instantiated only to reset its internal cache property.
    I did a little experiment and it seems this works.

    The ServicesResetter does skip lazy services.
    The ResettableServicePass adds IGNORE_ON_UNINITIALIZED_REFERENCE to the service references which it adds to the IteratorArgument.
    This seems to do the magic.

    What I don't see is how we could use this to reset different services for different events.
    It's all or nothing.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium kristiaanvandeneynde Antwerp, Belgium

    Small disclaimer that #41 and other comments mentioning MemoryCache are potentially no longer up-to-date as we now have full support for using MemoryCache. See this CR: https://www.drupal.org/node/3409455 โ†’

    With that said, I don't see why we can't use MemoryCache for what used to be cached in properties. The CacheTagsInvalidator service collects all cache bins and services that tag themselves as invalidators. We only need to focus on the first part here.

    Any cache bin collected is instantiated because we use service_collector rather than service_id_collector for the CacheTagsInvalidator. So if we would create a dedicated bin for every class that used to have a "property cache", we would indeed have to instantiate a bunch of services.

    However, I would argue we could have a "default" cache for memory like we do for the DB and encourage everyone to use that cache for their simple property caching. Then there's no overhead whatsoever because there will always be something using said memory cache and therefore the service will already be instantiated.

    Re #45

    We want to avoid that a service gets instantiated only to reset its internal cache property.

    Would that not apply only to static properties being used as an internal cache? Most of these internal caches I've seen use regular properties, meaning the service is instantiated either way.

    Having said all of that, the kernel.reset reset tag does look really cool. Thanks for bringing that to our attention.

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    We want to avoid that a service gets instantiated only to reset its internal cache property.

    Would that not apply only to static properties being used as an internal cache? Most of these internal caches I've seen use regular properties, meaning the service is instantiated either way.

    E.g. in drupal_flush_all_caches() we call `\Drupal::theme()->resetActiveTheme();`, which does get an instance of ThemeManager, only to set `$this->activeTheme = NULL;`.
    But if at the time this is called, ThemeManager had not been instantiated yet, `$this->activeTheme` would already be NULL / uninitialized.

    The same would happen if we use regular event subscription to call a ->reset() method to reset an object property.

    I think most of the solutions proposed here avoid this problem in one or another way.
    Which is good :)

    However, I would argue we could have a "default" cache for memory like we do for the DB and encourage everyone to use that cache for their simple property caching. Then there's no overhead whatsoever because there will always be something using said memory cache and therefore the service will already be instantiated.

    There is overhead in reading and writing these cache items, which is always going to be more php operations than with a property cache.
    All of it is just basic php operations. So it will only become relevant for caches where we read or write many small values.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium kristiaanvandeneynde Antwerp, Belgium

    E.g. in drupal_flush_all_caches() we call `\Drupal::theme()->resetActiveTheme();`, which does get an instance of ThemeManager, only to set `$this->activeTheme = NULL;`.
    But if at the time this is called, ThemeManager had not been instantiated yet, `$this->activeTheme` would already be NULL / uninitialized.

    Now imagine if ThemeManager had a CacheBackendInterface dependency instead. And let's also assume that this would be a "default" MemoryCache which is used by many services, not unlike the "default" DatabaseBackend.

    The point is that it then no longer matters if ThemeManager is instantiated or not because, in order to flush its caches, we would invalidate a certain cache tag. That would go over all cache bins, which are already instantiated, and then clear the data. At no point would ThemeManager be instantiated by our actions. Nor any other service for that matter.

    There is overhead in reading and writing these cache items, which is always going to be more php operations than with a property cache.

    True, but that's a tradeoff we should be willing to make as the benefits of this approach far outweigh the drawbacks. We'd get the above plus cache tag, context and max-age support. It would also mean that all of the classes using this cache would immediately benefit from any optimizations we can realize in the caching layer itself.

    And as you said:

    All of it is just basic php operations. So it will only become relevant for caches where we read or write many small values.

    Fully agree. We can still opt not to use a MemoryCache on a case by case basis in those edge cases where it really does make a difference.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Now imagine if ThemeManager had a CacheBackendInterface dependency instead. And let's also assume that this would be a "default" MemoryCache which is used by many services, not unlike the "default" DatabaseBackend.

    The point is that it then no longer matters if ThemeManager is instantiated or not because, in order to flush its caches, we would invalidate a certain cache tag. That would go over all cache bins, which are already instantiated, and then clear the data. At no point would ThemeManager be instantiated by our actions. Nor any other service for that matter.

    Yes this makes sense and should work great. We just need a convention/pattern for cache IDs.

    We're talking one or two extra function calls when reading/writing items to the cache, a bit more if/when we have LRU support from #1199866: Add an in-memory LRU cache โ†’ but that would only need to apply to unbounded stuff like entities, so the overhead should be minimal compared to everything else we do.

Production build 0.71.5 2024