Use Fibers to offload discovery builds during stampedes

Created on 10 January 2022, almost 4 years ago
Updated 4 August 2023, about 2 years ago

Problem/Motivation

After a full cache clear, Drupal has to build up various caches before it can serve a request. Router, menu links, library info, various plugin registries, theme registry, twig templates, etc.

This both takes a long time, and massively increases the memory usage of a request.

Steps to reproduce

drush cr, load a heavy-ish page, watch that spinny wheel / check mem_get_usage()

Proposed resolution

PHP 8.1 adds Fibers https://php.watch/versions/8.1/fibers / https://wiki.php.net/rfc/fibers

When we're routing, if we think we're in a rebuild situation, execute routing inside a fiber. Inside routing, when a route rebuild can't acquire a lock, instead of calling $lock->wait() suspend the fiber and do something else.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

10.0 ✨

Component
BaseΒ  β†’

Last updated about 2 months ago

Created by

πŸ‡¬πŸ‡§United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¬πŸ‡§United Kingdom catch

    After writing up πŸ“Œ Add PHP Fibers support to BigPipe RTBC I have a clearer idea of how this can be useful.

    There are two situations where a full cache clear is annoying:

    1. You're doing development, you change something, you clear cache, the page takes 25 seconds to load, especially if you're experiencing πŸ› Reduce the number of field blocks created for entities (possibly to zero) Fixed .

    2. You do a deployment on a busy site and there is a stampede where multiple requests hit empty caches. No page can be served until all the required caches have been built, if requests hit at the same time, you can end up rebuilding multiple expensive things multiple times each. Worst case your webserver is completely maxed out with requests, since they're all so individually slow none of them are completing before new ones come in.

    For very, very expensive operations we have used the lock system - for example the theme registry and router builder both ensure that only one request can build them at a time, other requests have to wait for the lock to be released then they should be able to get the already-cached thing from cache. There might be other expensive-to-build caches where we could apply the same thing.

    This is where it gets fun :)

    First of all, these registries aren't built up-front, they're generally requested at the last minute - i.e. we don't try to build the theme registry until a theme function is called, which is usually during rendering.

    However something like the router is needed right at the beginning of the request, until we have the route, we can't execute the controller, and it's the controller that will determine what exactly the rest of the request needs.

    But we know that on most sites if the router needs to be rebuilt, then the theme registry, field plugins, block plugins, library info and etc. are also going to need to be rebuilt very soon. Since all of these are both persistently and statically cached, if we just request them, it's either going to trigger a rebuild, or be nearly-free.

    This means we want to execute routing in a Fiber - the obvious place to do this to subclass Symfony's RoutingSubscriber and do it there. (We could also do it in an event subscriber just before RoutingSubscriber, but it would probably mean running some aspects of routing twice if we did that.

    We'd only want to run this logic if the router is likely to need rebuilding, and the way to determine that is to find out if the container has been rebuilt in this request. This means adding a protected property + getter to DrupalKernel so we can check if that's the case.

    Once routing is inside a Fiber, we can then call Fiber::suspend() in RouteBuilder::rebuild():

       if (!$this->lock->acquire('router_rebuild')) {
          if (Fiber::getCurrent()) {
            Fiber::suspend();
          }
          // Wait for another request that is already doing this work.
          // We choose to block here since otherwise the routes might not be
          // available, resulting in a 404.
          $this->lock->wait('router_rebuild');
          return FALSE;
        }
    

    Now, instead of calling sleep(), we can do other things, like build the theme registry, library info, plugin registries etc. we'd need a way to determine which of these to do, but maybe it can be something like a container parameter that's a list of callbacks you can add to. Or an event to subscribe to. If we apply the lock pattern to more of them, then whenever you reach something that another request is handling, you'd skip to the next one, and if it's already been done, then it's return from cache and be finished.

    Before:

    Request 1 starts to rebuild the router.
    Requests 2, 3, 4, and 5 $lock->wait() until the router is rebuilt.

    After:
    Request 1 runs routing inside a fiber, routing acquires a lock.
    Request 2 also runs routing in side a fiber, the router build suspends the fiber when it can't acquire a lock, starts to build the theme registry (would need to be the non-theme specific bit of the theme registry but we already build and cache that separately so could add a dedicated method for it, or build the registry for the default theme.)
    Request 3 theme registry rebuild suspends when it can't acquire lock, starts to build the library info cache.
    Request 4 suspends when it can't acquire a lock for the library info cache, starts to build the field formatter plugin registry.
    etc. etc.

    In between each possible thing to rebuild, we can restart the routing fiber again, and if routing completes, continue with the request - that way we're only running things when the router rebuild is actually running.

    I haven't yet figured out how to use this to speed up the individual development environment case. For example we could run the database queries in MatcherDumper::dump() async and do other things in between, but we don't want to, because we need the entire operation to finish as quickly as possible inside a transaction so we can release the lock and other requests can successfully get routed.

    One place that async might work for single requests though is YAML discovery and parsing - i.e. if we async discover the files, parse them as they come in, then we'd be interleaving file i/o intensive tasks (directory traversal and file_get_contents()) with CPU-intensive stasks (YAML parsing) rather than discovering all the files, then parsing all the files. But... that is probably going to be quite hard to implement.

  • πŸ‡ΊπŸ‡ΈUnited States effulgentsia

    #12 is a great analysis, thank you! But if the use-case it addresses is multiple parallel requests, then what's the value that Fibers bring?

    After:
    Request 1 runs routing inside a fiber, routing acquires a lock.
    Request 2 also runs routing in side a fiber, the router build suspends the fiber when it can't acquire a lock, starts to build the theme registry (would need to be the non-theme specific bit of the theme registry but we already build and cache that separately so could add a dedicated method for it, or build the registry for the default theme.)
    Request 3 theme registry rebuild suspends when it can't acquire lock, starts to build the library info cache.
    Request 4 suspends when it can't acquire a lock for the library info cache, starts to build the field formatter plugin registry.
    etc. etc.

    Do we need fibers for this? Can request 2 realize that it can't acquire a router rebuilding lock, then proceed to building the theme registry, and after building the theme registry check again if the router has been rebuilt (by request 1), etc.? Or is there something special about fibers that either makes this function better or make the code simpler?

  • πŸ‡¬πŸ‡§United Kingdom catch

    Do we need fibers for this?
    Can request 2 realize that it can't acquire a router rebuilding lock, then proceed to building the theme registry, and after building the theme registry check again if the router has been rebuilt (by request 1), etc.?

    There are two reasons, one is immediately useful, one is only potentially/speculatively useful.

    If we zoom out slightly from the previous code example:

      public function rebuild() {
        if ($this->building) {
          throw new \RuntimeException('Recursive router rebuild detected.');
        }
    
        if (!$this->lock->acquire('router_rebuild')) {
          if (Fiber::getCurrent()) {
            Fiber::suspend();
          }
          // Wait for another request that is already doing this work.
          // We choose to block here since otherwise the routes might not be
          // available, resulting in a 404.
          $this->lock->wait('router_rebuild');
          return FALSE;
        }
    
        $this->building = TRUE;
    
        $collection = new RouteCollection();
        foreach ($this->getRouteDefinitions() as $routes) {
    

    That Fiber::suspend() is right in the guts of the routing system, and it would take us back out to RoutingSubscriber, (or an event subscriber that runs just before RoutingSubscriber if we don't mind an extra cache get or adding a static cache or so). Once it's tried to prewarm something else, it would then $fiber->resume() which would take us to the code point immediately after Fiber::suspend() - we'd see if the lock has been released in the meantime and continue as if nothing had happened in between.

    If we didn't use a fiber, we'd have to embed the prewarming logic right inside RouterBuilder::rebuild() itself. This would mean adding a prewarming API to RouterBuilder, vs. adding a prewarming API that incorporates the router builder. So it only helps from a code organisation point of view to keep the prewarming logic a bit more decoupled. This also means we could have done this years ago!

    The potential/speculative reason is that if we find something that can be pre-warmed which can also use an async API, then this trick would work to speed things up within a single request. The problem with that, is that here we care about real performance as much as perceived performance. We want the actual router rebuild to finish as soon as possible so that other requests get a cache hit and not a lock wait, and that's also the case for the theme registry and most/all other things we'd add in here. So we wouldn't want to do something like fire off the query that writes back to the router table async, Fiber::suspend(), then check if it's finished and release the lock, because if what we do in Fiber::suspend() takes longer than the query does to come back, we've extended the overall Router::rebuild() time which could mean more requests hitting the lock.

    This is different from πŸ“Œ Add PHP Fibers support to BigPipe RTBC where it's OK if any individual placeholder takes a fractionally longer wall time to finish, because another one was rendering in the meantime - none of them are going to be blocking other HTTP requests from completing and the important thing for us is that the overall completion time is faster rather than any individual part.

    However, still getting my head around this, so there might be an async use case after all, and soon as we have even one of those, it'll be able to make use of the logic.

    Another example might be if drush implemented an extreme version of this on cli - i.e. a drush cr-prewarm. This would clear the cache, then immediately start to rebuild the various registries, but because it'd be cli, it could use amphp (or drush's own child process logic) to do each one fully async, simulatenously build the router and the theme registry at the same time. But to allow that, it's better that the prewarming logic in core is encapsulated as much as possible, otherwise you wouldn't be able to independently router rebuild and theme registry rebuild without one triggering the other.

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

    I feel like I need to dig in more and play with fibers to see how this is useful. I was adking the same question of "Why don't we just fire an event and let the inherit parallelization work any pre-warming without the fiber complexity" and I'm not sure I understand how Fibers helps still. Either way I like the idea and look forward to seeing where it goes.

  • πŸ‡¬πŸ‡§United Kingdom catch

    @neclimdul I think the tl;dr is if it's the only thing we do that's different, it's extremely marginal here, but it opens up possibilities later on.

    I have a green patch on πŸ“Œ Add PHP Fibers support to BigPipe RTBC now fwiw (although that is also not taking advantage of async yet).

  • Status changed to Needs review about 2 years ago
  • last update about 2 years ago
    Patch Failed to Apply
  • πŸ‡¬πŸ‡§United Kingdom catch

    Working code on πŸ“Œ Add PHP Fibers support to BigPipe RTBC helped me think through this a bit too. Uploading a proof of concept.

    This doesn't add a prewarm API, it hardcodes the theme registry directly into a decorated router_listener even listener, but it does implement the fibers stuff.

    I was wrong about the theme registry, while it does use a lock, it only uses it to ensure atomic writes to the cache, not to block/wait when building the cache.

    However, the principle is the same and there's another (better) reason to use fibers I hadn't thought of above.

    The advantage is that the theme registry has a two-stage cache build, first it builds a cache item with only the module theme hooks, then it build the theme-specific registry on top of that.

    So if we're inside ::build() and we get a cache hit on the module registry, then either we just haven't built the full registry for this specific theme yet, or another process is about to beat us to it.

    This means we can add the Fiber::suspend() at that completely arbitrary point, and it'll jump back to the RouterListener, which can just go back to worrying about routing. If we need to build the rest of the theme registry alter, that'll happen (inside or outside of a fiber later on).

    If we didn't use fibers and used some kind of prewarmer wrapper, then we could potentially do things like check for locks and cache items, but we'd have to do so outside the class in question with internal knowledge of it. So what we get is the ability to pause execution of the prewarming code paths at any particular point.

    Also if we do have another service that lock waits like the router, then you could also suspend() in those services as soon as you're unable to acquire a lock - since we don't want to wait for those services to build, we're only have to wait for the router itself. This would also let you move on to another service each time one suspends.

  • πŸ‡©πŸ‡ͺGermany donquixote

    If we didn't use a fiber, we'd have to embed the prewarming logic right inside RouterBuilder::rebuild() itself. This would mean adding a prewarming API to RouterBuilder, vs. adding a prewarming API that incorporates the router builder.

    Not necessarily.
    We could instead add this to the locking mechanism itself.

    So instead of $lock->wait(), we call $lock->waitAndBeUseful().
    Or better, keep the $lock->wait(), but do something useful in that time.

    I am not saying that this is better than fibers, perhaps there are still other reasons for using them.

  • last update about 2 years ago
    27,718 pass, 1,016 fail
  • πŸ‡¬πŸ‡§United Kingdom catch

    So instead of $lock->wait(), we call $lock->waitAndBeUseful().
    Or better, keep the $lock->wait(), but do something useful in that time.

    I am not saying that this is better than fibers, perhaps there are still other reasons for using them.

    I like this idea. It seems a lot better to me than having to decorate RouterListener which is a bit of boilerplate, forgot it was final so we can't just subclass it. Feels like a new method to me (or a new parameter on the existing method), since we have some very low level usage of the lock API such as in ✨ Add a way to delay executions in test runner until terminate event completed in the child site Fixed .

    The other advantage is that we could intersperse work with polling - so poll the lock, do something, poll the lock, do something else, the proof of concept I've got above just does one single thing if it fails to aquire a lock, which is still better than nothing but it gives us more options.

    One issue it might introduce is recursive calls - i.e. two systems use ::lockWaitAndBeUseful() and one ends up calling the other, in the second one, we wouldn't want to restart the 'being useful' game but instead back out immediately and try to do something else. But since it's all in the lock backend it could short-circuit that and immediately return from a nested call. So probably easy enough to deal with.

    I'm also trying to think through how we could organise the 'do something'.

    Let's say we have:
    Router rebuild
    Theme registry rebuild
    Library info rebuild

    Ideally we want it to look like:
    Request A: actually rebuilding the router
    Request B: theme registry rebuild while waiting
    Request C: library info rebuild while waiting

    And not:

    Request A: actually rebuilding the router
    Request B: theme registry rebuild while waiting
    Request C: theme registry rebuild while waiting

    This would be achievable by using the lock pattern more, i.e.

    Request C: try to acquire router lock, try to acquire theme registry lock, actually rebuild the library info cache.

    Either way I think we need a list of callbacks from the container, not something like the hook or event API where we expect to run all of them in an explicit order usually.

    A new interface + tagged services might fit it better. The new interface would also allow the service to do something different while it's waiting in a ::preWarm() method - for example the theme registry could explicitly just rebuild the module theme hook cache entry and not any theme-specific bits, and it could wrap just that bit in a lock but not the whole operation.

  • πŸ‡¬πŸ‡§United Kingdom catch

    OK there is a real (not just preference) issue with trying do the prewarming from within the lock system, even if we added a dedicated method:

    This is at the top of RouteBuilder::rebuild():

      /**
       * {@inheritdoc}
       */
      public function rebuild() {
        if ($this->building) {
          throw new \RuntimeException('Recursive router rebuild detected.');
        }
    

    This means that if any module in any prewarming service triggers routing, we'd hit that recursion protection. An example could be someone adding a routed library URL in hook_library_info_alter().

    But this has given me a couple of different ideas, going to try to get a proof of concept patch going along with the explanation though.

  • πŸ‡©πŸ‡ͺGermany donquixote

    It is a strange concept in Drupal, but we can actually catch exceptions :)
    But, if the router rebuild is protected by a lock, we will hit the lock before we hit the recursion exception.

    To make this idea work, we would have to introduce a lock id for every cache warming task.

    There should be a registry of cache warming tasks with dependencies.
    This also allows to register sub-tasks, e.g. to collect routing information of views module and store them in an intermediate cache.
    Tagged services could work for this, yes.

    During the "do something useful", if one cache warming task is blocked by a lock, we should skip it and move on.
    The first idea would be that the lock id is registered with the cache warming task. But this could be weird if we call it from outside the cache warming, and it needs to acquire the same lock.

    What we could do instead is start the cache warming task as a fiber from within lock wait.
    Then within the fiber, when we hit a lock, we stop the fiber.

    So, lock wait would be like this:
    - If we are _not_ within a fiber, we look for cache warming tasks, and start them as fibers.
    - If we are already within a fiber, we stop the fiber.

    Feels like a new method to me (or a new parameter on the existing method), since we have some very low level usage of the lock API such as in #3375959: Add a way to delay executions in test runner until terminate event completed in the child site.

    Why is this a concern?
    For low level usage, I think we optimize for the case when the lock is not blocked.
    Any overhead only occurs if the lock is blocked.
    But maybe I am missing something.
    Perhaps it is just safer to not always run expensive tasks when something is locked.

    Another option would be different lock handlers. One that does cache warming tasks during sleep, another that does not.

  • last update about 2 years ago
    28,518 pass, 1 fail
  • πŸ‡¬πŸ‡§United Kingdom catch

    OK this is basically the same approach as before, but with the code moved to completely different and IMO more appropriate places.

    The Fiber::suspend() logic moves to LockBackendAbstract::wait() - see the code comments in there.

    I've also added a non-locking Fiber:suspend() implementation to the theme registry

    Instead of decorating RouterListener, I stuck the entirety of HttpKernel::handle() inside a Fiber, this means any Fiber::suspend() during handling of a request can potentially make use of the cache prewarming logic (if it bubbles up to this point).

  • last update about 2 years ago
    29,945 pass, 1 fail
  • Status changed to Needs work about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    I've added the lock and theme registry changes over there too πŸ“Œ Add PHP Fibers support to BigPipe RTBC because they're equally applicable, and that issue has a clearer path forwards so far I think. If they land over there, it'll just mean a smaller patch here.

    One issue with testing this is we do our absolute best with the router unlike discovery caches, to rebuild it in the request that forces the rebuild - either in drupal_flush_all_caches() directly or at the end of a request that triggers rebuildNeeded. This means you can't just clear caches then hit a site with ab, because clearing caches will rebuild the router.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Test failure is just because the call stack is longer.

  • πŸ‡¬πŸ‡§United Kingdom catch

    We want to know which services to apply this to - which ones take a long time to build and so should Fiber::suspend() when they get a cache miss, and which ones are good candidates for prewarming because no matter how long they take, they're likely to block the request being served.

    I applied the following patch to the database cache backend - timers.txt, this produces the report cache.txt, which I ran on the standard front page, after a cache clear (and after deleting the file so it doesn't include caches already being built by drush cr), it gives us:

    1. The order in which caches are requested.
    2. When the caches are set, and a time in ms between get and set.

    Truncated version with the ones we should look at:

    routing.non_admin_routes and element_info are very early.

    get: routing.non_admin_routes
    set: routing.non_admin_routes time: 0.93
    get: theme.active_theme.olivero
    set: theme.active_theme.olivero time: 0.64
    get: element_info_build:olivero
    get: element_info
    set: element_info time: 2.64
    set: element_info_build:olivero time: 19.18
    

    views_data:en sets off a whole palindrome of cache gets and sets on an empty cache, the intermediate ones are good candidates for prewarming like entity_field_map and the bundle field definitions.

    get: views:access
    get: views_data:node_field_data:en
    get: views_data:en
    get: entity_type_definitions.installed
    get: block_content.field_storage_definitions.installed
    set: block_content.field_storage_definitions.installed time: 1.51
    get: entity_base_field_definitions:block_content:en
    get: field_types_plugins
    get: typed_data_types_plugins
    set: entity_base_field_definitions:block_content:en time: 4.48
    get: entity_field_storage_definitions:block_content:en
    set: entity_field_storage_definitions:block_content:en time: 4.44
    get: entity_base_field_definitions:user:en
    set: entity_base_field_definitions:user:en time: 0.65
    get: comment.field_storage_definitions.installed
    set: comment.field_storage_definitions.installed time: 1.03
    get: entity_base_field_definitions:comment:en
    set: entity_base_field_definitions:comment:en time: 0.62
    get: entity_field_storage_definitions:comment:en
    set: entity_field_storage_definitions:comment:en time: 2.33
    get: entity_base_field_definitions:node:en
    set: entity_base_field_definitions:node:en time: 0.78
    get: entity_field_map
    get: entity_bundle_info:en
    get: entity_base_field_definitions:contact_message:en
    set: entity_base_field_definitions:contact_message:en time: 0.63
    get: entity_base_field_definitions:file:en
    set: entity_base_field_definitions:file:en time: 0.72
    get: entity_base_field_definitions:menu_link_content:en
    set: entity_base_field_definitions:menu_link_content:en time: 0.89
    get: entity_base_field_definitions:path_alias:en
    get: entity_base_field_definitions:shortcut:en
    set: entity_base_field_definitions:shortcut:en time: 0.76
    get: entity_base_field_definitions:taxonomy_term:en
    set: entity_base_field_definitions:taxonomy_term:en time: 0.81
    set: entity_field_map time: 19.09
    get: file.field_storage_definitions.installed
    set: file.field_storage_definitions.installed time: 1.27
    get: entity_field_storage_definitions:file:en
    set: entity_field_storage_definitions:file:en time: 0.48
    get: node.field_storage_definitions.installed
    set: node.field_storage_definitions.installed time: 1.27
    get: entity_field_storage_definitions:node:en
    set: entity_field_storage_definitions:node:en time: 2.67
    get: search_plugins
    get: taxonomy_term.field_storage_definitions.installed
    set: taxonomy_term.field_storage_definitions.installed time: 1.46
    get: entity_field_storage_definitions:taxonomy_term:en
    set: entity_field_storage_definitions:taxonomy_term:en time: 0.55
    get: user.field_storage_definitions.installed
    set: user.field_storage_definitions.installed time: 1.41
    get: entity_field_storage_definitions:user:en
    set: entity_field_storage_definitions:user:en time: 2.9
    get: entity_bundle_field_definitions:block_content:basic:en
    set: entity_bundle_field_definitions:block_content:basic:en time: 4.09
    get: entity_bundle_field_definitions:comment:comment:en
    set: entity_bundle_field_definitions:comment:comment:en time: 13.31
    get: entity_bundle_field_definitions:node:article:en
    set: entity_bundle_field_definitions:node:article:en time: 45.53
    get: entity_bundle_field_definitions:node:page:en
    set: entity_bundle_field_definitions:node:page:en time: 8.72
    get: entity_bundle_field_definitions:user:user:en
    set: entity_bundle_field_definitions:user:user:en time: 5.87
    set: views_data:en time: 333.14
    
    
    local action plugins and local task plugins, and block plugins:
    <code>
    get: local_action_plugins:en
    set: local_action_plugins:en time: 1.37
    get: local_task_plugins:en:view.frontpage.page_1
    get: local_task_plugins:en
    get: entity_form_mode_info:en
    set: entity_form_mode_info:en time: 3.66
    get: entity_view_mode_info:en
    set: entity_view_mode_info:en time: 4.21
    set: local_task_plugins:en time: 20.59
    set: local_task_plugins:en:view.frontpage.page_1 time: 26.83
    get: shortcut.field_storage_definitions.installed
    set: shortcut.field_storage_definitions.installed time: 0.76
    get: entity_bundle_field_definitions:shortcut:default:en
    set: entity_bundle_field_definitions:shortcut:default:en time: 2.54
    get: library_info:olivero
    set: library_info:olivero time: 1.57
    get: block_plugins
    set: block_plugins time: 111
    
  • Status changed to Needs review about 2 years ago
  • last update about 2 years ago
    Custom Commands Failed
  • πŸ‡¬πŸ‡§United Kingdom catch

    Added an API, and it seems to be working.

    Here's the before from above:

    get: routing.non_admin_routes
    set: routing.non_admin_routes time: 0.93
    route_provider.route_load:0e574a6220b0c807e7d410c61f43c6f02773621676f881cbb160db210b856d32827b71a046ee16fa2dcc0352a44d773c067bdc3531bcb3960a66710d128298a9
    set: route_provider.route_load:0e574a6220b0c807e7d410c61f43c6f02773621676f881cbb160db210b856d32827b71a046ee16fa2dcc0352a44d773c067bdc3531bcb3960a66710d128298a9 time: 1.92
    

    As you can see, it gets the non admin routes and sets that, then it gets route_provider.route_load.hash (which is the cache of the preloaded routes query) and sets it.

    This is the after:

    get: routing.non_admin_routes
    set: routing.non_admin_routes time: 0.43
    get: route_provider.route_load:0e574a6220b0c807e7d410c61f43c6f02773621676f881cbb160db210b856d32827b71a046ee16fa2dcc0352a44d773c067bdc3531bcb3960a66710d128298a9
    

    so far the same, but now it changes:

    get: entity_field_map
    get: entity_type
    get: entity_bundle_info:en
    get: entity_base_field_definitions:block_content:en
    get: field_types_plugins
    get: typed_data_types_plugins
    get: module_implements
    get: hook_info
    set: hook_info time: 0.17
    [snip]
    get: entity_base_field_definitions:user:en
    set: entity_base_field_definitions:user:en time: 0.74
    set: entity_field_map time: 109.13
    get: route_provider.route_load:0e574a6220b0c807e7d410c61f43c6f02773621676f881cbb160db210b856d32827b71a046ee16fa2dcc0352a44d773c067bdc3531bcb3960a66710d128298a9
    set: route_provider.route_load:0e574a6220b0c807e7d410c61f43c6f02773621676f881cbb160db210b856d32827b71a046ee16fa2dcc0352a44d773c067bdc3531bcb3960a66710d128298a9 time: 1.56
    

    So what it's done is try to get the route preload cache, got a miss, built the entity_field_map instead, gone back to get the route preload cache, still got a miss, built it and cached it.

  • last update about 2 years ago
    Custom Commands Failed
  • πŸ‡¬πŸ‡§United Kingdom catch

    Try to make cspell happy.

  • last update about 2 years ago
    Custom Commands Failed
  • πŸ‡«πŸ‡·France andypost

    Fix CS

  • last update about 2 years ago
    29,951 pass, 1 fail
  • πŸ‡«πŸ‡·France andypost

    and one more

  • last update about 2 years ago
    Custom Commands Failed
  • πŸ‡¬πŸ‡§United Kingdom catch

    Prewarmer service now needs an alias for autowiring.

  • last update about 2 years ago
    29,953 pass
  • πŸ‡«πŸ‡·France andypost

    @catch looks you're picked not #30 patch, here's interdiff from #32 added and tests should be green

  • πŸ‡¬πŸ‡§United Kingdom catch

    Thanks @andypost!

    Updating the issue summary.

  • last update about 2 years ago
    29,953 pass
  • πŸ‡¬πŸ‡§United Kingdom catch

    Small improvement to CachePrewarmer - keep track of what we've already prewarmed so we don't run the same one twice. Should be built off #33 this time.

  • πŸ‡«πŸ‡·France andypost
    1. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmer.php
      @@ -11,6 +11,16 @@
      +   * @var array
      +   */
      +  protected array $calledServices = [];
      

      could use @var string[]

    2. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmer.php
      @@ -20,9 +30,13 @@ public function prewarmCaches() {
      +    if ($candidates) {
      +      $key = array_rand($candidates);
      

      needs a comment why random service is used?
      This way it can use to run some tasks twice

  • πŸ‡¬πŸ‡§United Kingdom catch

    So #27 shows that we can take a cache rebuild that is usually nested instead the views data rebuild, prewarm it earlier in the request, and then save that time later in the request when we actually build the views data cache. That shows this working in principle.

    I think there might be a way to show this working in practice, but it's going to be fairly hard because it relies on concurrent requests to do its work.

    Something like this:

    1. Add PreWarmableInterface to 4-5 other services, ideally we want more that are 50-100ms+. We could also find one more earlyish cache miss to add a Fiber::suspend() to.

    2. Use the tool which I'd normally tell people not to use for core performance testing - ab. What we want is just to hit the site with about 10 requests like ab -c10 -n1 immediately after a drush cr. In the 'before' case they will all individually build all of the caches, in the after case, they should divide some of the cache building up between them. If we manage to find 500ms of cache building to divide up, we could see as much as 400ms improvement. This might be a big enough change to see consistently with ab. We'd need to run the before and after tests 3 or more times each to get a baseline. We can't increase -n because all the subsequent requests will be cache hits and see no changes, but running multiple times should be enough hopefully. If it's not visible, we can artificially slow down one or two of the rebuilds with sleep(1) to simulate a 200+ module site instead of a 20+ one.

    #36 feedback is good - I'll add a longer comment. Need to add a lot of high level docs in a few places still too.

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX
    1. +++ b/core/lib/Drupal/Core/DrupalKernel.php
      @@ -701,7 +701,18 @@ public function handle(Request $request, $type = self::MAIN_REQUEST, $catch = TR
      +        $this->container->get('cache_prewarmer')->preWarmCaches();
      
      @@ -717,6 +728,20 @@ public function handle(Request $request, $type = self::MAIN_REQUEST, $catch = TR
      +  /**
      +   * Prewarms caches for services that support it.
      +   *
      +   * When handling a request with cold caches, some services lock expensive
      +   * operations to prevent two processes handling them at the same time. When
      +   * this happens, further requests may go into a LockInterface::wait() pattern
      +   * which will usually sleep until the lock is released, in the hope that the
      +   * other request has cached in the meantime ready for it to use.  This method
      +   * allows us to do something else during that wait time instead of sleeping.
      +   */
      +  protected function prewarmCaches(): void {
      +    $this->container->get('cache_prewarmer')->prewarmCaches();
      +  }
      +
      

      Did you mean to call this new helper in the request changes?

    2. +++ b/core/lib/Drupal/Core/Lock/LockBackendAbstract.php
      @@ -37,18 +37,32 @@ public function wait($name, $delay = 30) {
      +    // If executing inside a fiber, then suspending the fiber implicitly waits
      +    // for whatever the parent process does before it is resumed again. Try
      +    // that so that other tasks can continue, before sleeping only if necessary.
      +    // @see Drupal\Core\DrupalKernel::prewarmCaches() for an example which takes
      +    // explicit advantage of this behaviour.
      +    if (\Fiber::getCurrent() !== NULL) {
      +      $sleep = 0;
      +      \Fiber::suspend();
      +    }
      +    else {
      +      // Begin sleeping at 25ms.
      +      $sleep = 25000;
      +    }
      +
      

      Would this be a little less complex if it was directly in the loop or looked more like this?

        $wait = function () use ($sleep) {
          if (\Fiber::getCurrent()) {
            \Fiber::suspend();
          }
          else {
            usleep($sleep);
          }
        };
      
        // In loop
        $wait();
      
    3. +++ b/core/lib/Drupal/Core/Lock/LockBackendAbstract.php
      @@ -37,18 +37,32 @@ public function wait($name, $delay = 30) {
      -    // Begin sleeping at 25ms.
      -    $sleep = 25000;
           while ($delay > 0) {
             // This function should only be called by a request that failed to get a
      -      // lock, so we sleep first to give the parallel request a chance to finish
      -      // and release the lock.
      -      usleep($sleep);
      +      // lock, so if  we haven't already suspended a fiber, sleep first to give
      +      // the parallel request a chance to finish and release the lock.
      +      if ($sleep > 0) {
      +        usleep($sleep);
      +      }
             // After each sleep, increase the value of $sleep until it reaches
             // 500ms, to reduce the potential for a lock stampede.
             $delay = $delay - $sleep;
      

      If sleep is 0, is there a possibility this becomes a hot loop waiting for the lock to release?

  • πŸ‡¬πŸ‡§United Kingdom catch

    Did you mean to call this new helper in the request changes?

    I actually meant to get rid of the helper because it's a one-liner now there's a service.

    Would this be a little less complex if it was directly in the loop or looked more like this?

    Not sure the end point but probably yes. I think it should be in the loop, and we could probably do something like call the prewarmer when we start the loop, then check if the lock is available, then sleep, every time so that if we're waiting for a long time, more prewarming can happen. That would then get rid of all the $sleep = 0 logic altogether.

  • last update about 2 years ago
    29,953 pass
  • πŸ‡¬πŸ‡§United Kingdom catch

    Hopefully addressing #36 and #38 and also adding a lot of docs.

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

    Great! Making a lot more sense.

    +++ b/core/lib/Drupal/Core/Lock/LockBackendAbstract.php
    @@ -45,16 +45,27 @@ public function wait($name, $delay = 30) {
    +      if (\Fiber::getCurrent() !== NULL) {
    +        \Fiber::suspend();
    +      }
    

    Because it keeps popping into my head I'm going to throw this out there. Are these methods being static(global) going to be a problem? We won't know where this is called so we don't know if we're triggering the pre-warm or some other fiber. Does it matter? Could this trigger un-intended behaviors in modules down the line if they use fibers?

  • last update about 2 years ago
    29,953 pass
  • πŸ‡¬πŸ‡§United Kingdom catch

    Actually one more change.

    Moving this out of RouteProvider::preload() into RoutePreLoader::onRequest() for two reasons:

    1. Because there are two cache gets and sets involved, we can Fiber::suspend() immediately after the first (extremely cheap to build because it just copies over from what's in state) cache is set, that way there is no checking cache again, just suspend and move on. This conflicts a bit with πŸ“Œ [PP-1] Use cache collector for state Needs work but we could always move it back to RouteProvider::preload() once that's in, or somewhere else I haven't thought of.

    2. The event listener is much more closely tied to where this happens during request execution, and because it's an event listener it's more swappable/modifiable.

  • πŸ‡¬πŸ‡§United Kingdom catch

    We won't know where this is called so we don't know if we're triggering the pre-warm or some other fiber. Does it matter? Could this trigger un-intended behaviors in modules down the line if they use fibers?

    It both matters a lot and shouldn't matter at all!

    Since we already have another implementation in the works, πŸ“Œ Add PHP Fibers support to BigPipe RTBC , this is a real consideration, but they're complementary with each other.

    The DrupalKernel::handleRequest() call is acting as a 'top level' Fiber, it's giving anything that might be running in the critical path, especially early request handling, the opportunity to call Fiber::suspend() when it thinks the request might be better off doing something else instead of continuing. And the role of the cache prewarmer is to provide that 'something else' in the absence of anything else.

    DrupalKernel::handleRequest() doesn't execute the actual prewarming API in a Fiber though, so anything calling Fiber::getCurrent() will get NULL and just keep going instead of suspending (or Fibers will throw an exception if they don't check Fiber::getCurrent() properly but that'd be a straightforward bug).

    This means DrupalKernel::handleRequest() is always going to get a suspension, try to prewarm, resume the fiber, and maybe it'll try to do the same again, with a different PreWarmableInterface implementation if it gets suspended later in the request, but it'll always resume once it's done it's thing (unless there's an exception thrown or fatal error) and allow the rest to continue as normal. This is why the while() loop.

    The BigPipe implementation is much later in the request and will be happening a level underneath the DrupalKernel one, and it is looping over multiple different callbacks (bigpipe placeholders) and resuming them if they suspend, until they're all done. So for the BigPipe case there is an array of Fibers involved, all at the same level of execution, moving from one to the other, unlike DrupalKernel which is only dealing with one at a time.

    It is very possible that some code that thinks it's likely to be called by the Drupal::Kernel::handleRequest() Fiber (which would include anything we add here until the other issue lands) might actually end up being called by the BigPipe Fibers once they're both in core.

    However, this should still work nicely!

    For example, let's say we have two placeholders,

    Placeholder A is a views listing block, so it has to execute the listing query first (async, when that's added), and calls Fiber::suspend() before waiting for the query to be returned.

    Placeholder B just gets the current user and renders their username.

    Either the username placeholder or the views listing block could be called first depending on the page layout/block configuration.

    Username first:

    If the theme registry is empty when we go to render the username, then code added here could end up calling Fiber::suspend() with the expectation that the cache prewarming API will be called.

    Now while the username placeholder is suspended, BigPipe would move onto the views listing block, and execute the async query, and suspend itself before waiting for the async query to come back.

    Then, BigPipe moves back to the username placeholder. The theme registry checks the property and cache get again when it's resumed, just in case it got built while it was suspended. If it doesn't do this, there was no point in suspending because it would end up doing the work anyway.

    There are two possibilities when we get back to the username placeholder:

    1. While we were in Views, some other process built the theme registry cache for us. Now we didn't have to build the theme registry cache at all, even though it was empty when we tried to get it, although the dedicated cache prewarming API wasn't involved here (at least in this request), it ended up being a byproduct of the BigPipe implementation instead. But broadly the same thing we wanted to happen happened.

    2. It's still a cache miss for the theme registry (because actually no other requests are happening, or they didn't finish while we were in Views), so we build it ourselves. But!!! we're now building the theme registry cache after the views async query was executed, and before views checks if it returned, so it's filling in that time that would otherwise be spent waiting for the query to execute.

    Now whether #1 or #2 was true, when we go back to Views, it will now wait for the async query to return (minus the time we spent rendering the username), then when it does, or if it already has, it'll render (and the theme registry cache is warm by this point too).

    When the views listing block is first, it's slightly different, but ends up in the same place:

    1. Views executes the async query and suspends.
    2. Username placeholder ends up in a theme registry cache miss and suspends.
    3. Views is resumed and waits for the async query to come back and renders. Now because Views is rendering a template, it gets the theme registry cache miss itself. This is a separate call, so the theme registry suspends here too, in the same place. This is the second suspension from the views placeholder via different code paths.
    4. BigPipe resumes the username placeholder, it already suspended from the theme registry before, so it resumes from there, and either gets a cache hit this time or builds it. Then the username gets rendered and it's done.
    5. Bigpipe resumes the views placeholder for a second time, it gets a cache hit for the theme registry this time, and it renders.

    In both cases, the views async query ends up getting fired before the theme registry is built, and the theme registry is only built once. We do get two or three cache misses for the theme registry before we actually try to build it, but cache gets are cheap (and cheaper than locking which is our other option) and they'll only happen on a miss.

    Now let's say there's a custom module that knows nothing about any of the above, it has a list of RSS feeds, and it gets the most recent item from each feed, and renders it in a block. Because BigPipe is installed, this gets executed in a BigPipe placeholder, which is now inside a Fiber, which is itself inside the DrupalKernel Fiber.

    The custom feeds block module also wants to use concurrency, so it uses Guzzle async to fire off requests to all of the RSS feeds at once, and then it keeps resuming (and suspending) Fibers. When any async RSS feed response is returned from a fiber, it parses the RSS and puts it into an array, then resumes the other Fibers to see if another RSS feed as come back, until they're all done. This is now three levels of Fibers - the RSS block Fiber, inside the BigPipe Fiber, inside the DrupalKernel::handleRequest() Fiber.

    Because the RSS block is both executing the Fibers and also controls the code that is suspending those Fibers, all the calls should stay within that context this time - i.e. it's a closed loop of Fiber execution and suspension. It will request all the RSS feeds, eventually get all the RSS feeds, then finish. Then BigPipe will move onto the next placeholder and it has no idea that all of this has happened.

    So it should all work, but there are various considerations:

    1. Code that is executing Fibers and doesn't control all the the call chain below it, needs to make sure that no matter how many times the fiber is suspended, that it eventually gets resumed (if it cares about it finishing at all).

    while ($fiber::isSuspended())
    $fiber->resume()
    }

    2. Code that is suspending a Fiber needs to make sure it will eventually stop suspending, - i.e. we need to be very careful in lock wait that we don't end up in an infinite loop.

    (these following two are specific to the implementation here

    3. Code that is suspending a fibre after a cache miss in the hope that something else will build it, needs to check the cache again when it's resumed, so it actually benefits when it's been built, and doesn't just build it again regardless.

    4. Implementations of PrewarmableInterface need to make sure they're safe whether routing has happened or not, whether there's a session or not, whether there's a request or not, because these especially have no idea where they'll get called from - but this is not particularly specific to Fibers at all but to the general problem of trying to find something to do at any arbitrary point in the request.

  • last update about 2 years ago
    Build Successful
  • πŸ‡¬πŸ‡§United Kingdom catch

    Adding element info and all of the views plugin managers. They do not take long individually, 2-3ms each, although it would be longer with more modules.

    : views:sort
    set: views:sort time: 2.6
    
    get: views:filter
    set: views:filter time: 2.54
    
    get: views_data:views:en
    set: views_data:views:en time: 0.79
    get: views:area
    set: views:area time: 1.75
    
    get: views_data:node:en
    set: views_data:node:en time: 0.8
    get: views:pager
    set: views:pager time: 1.54
    get: views:query
    set: views:query time: 2.46
    
    get: views:style
    set: views:style time: 2.4
    

    However cumulatively, as noted above, views_data is about 300ms with the standard profile and they are all nested inside there. So by taking them out, we're dividing up a 300ms-to-build cache item into over a dozen independently prewarmable items. We've already split out 100ms from entity_field_map, this might be another 50ms or so.

    More advantages to this:

    1. The more prewarmable things that the prewarmer can potentially call out to, the less chance of duplicates.
    2. The more heterogeneity of prewarmable things, the more variation in how long they each take to build there is, and the more likelihood that requests stop trying to do the same thing at once. So adding quick-to-build caches alongside some slow-to-build ones will mix things up a bit.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Another one: ViewsData itself, however this is a tricky one because it stores $this->currentLanguage() in state. So we'd need to refactor that a bit to make it pre-routing safe. One option would be to not assign the language to a property, but get it from the language manager each time.

    Also in case it comes up, it doesn't matter if we tag one thing as cache_prewarmable, then we also tag something it calls as cache_prewarmable - the main thing here is building these caches out of the regular execution order.

  • last update about 2 years ago
    Custom Commands Failed
  • πŸ‡¬πŸ‡§United Kingdom catch

    Fixing some of those test failures.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Started looking at ViewsData and realised it shouldn't be caching by current language at all. Not a massive issue unless you have several languages installed, but still a waste, opened πŸ› ViewsData should not cache by language Needs work . Once that lands it'd be a very straightforward addition here.

  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • last update about 2 years ago
    29,946 pass, 2 fail
  • πŸ‡¬πŸ‡§United Kingdom catch

    Fixing the unused use statement.

    Writing up the issue summary more, another thought occurred. So far I could see drush implementing cache prewarming as a post drush cr task - but drush cr is only one of the ways that caches get cleared, a very common one is running updates, installing a module via the UI etc.

    So what if we did the following:

    1. drupal_flush_all_caches() calls CachePrewarmer::setCachesCleared() - this just changes a boolean property on CachePreWarmer to TRUE
    2. CachePreWarmer registers itself as a post-response task
    3. If the boolean is set, instead of trying to randomly prewarm one thing, in this listener it tries to prewarm everything (still at random).
    4. Because post response is properly non-blocking after πŸ› Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed , the next page will still get requested as quickly as before, but now with whatever caches warmed that the post response task has managed to warm up for it before it requests them.

    This would speed up a lot of local development tasks, possibly also user experience in the installer, if it works.

    It should a small amount of code to write this, then some manual testing to see if it works (and also see whether it causes DrupalCI to blow up).

  • last update about 2 years ago
    29,958 pass
  • Status changed to RTBC about 2 years ago
  • πŸ‡§πŸ‡ͺBelgium borisson_ Mechelen, πŸ‡§πŸ‡ͺ

    That is a known random, back to rtbc.

  • Status changed to Needs review about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    RTBC would be nice but it was at needs review ;)

  • πŸ‡¬πŸ‡§United Kingdom catch
  • last update about 2 years ago
    29,958 pass
  • πŸ‡¬πŸ‡§United Kingdom catch

    This implements DestructableInterface and adds a couple of extra methods.

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX
    +++ b/core/lib/Drupal/Core/Render/ElementInfoManager.php
    @@ -82,6 +83,13 @@ public function __construct(\Traversable $namespaces, CacheBackendInterface $cac
    +  public function preWarm(): void {
    +    $this->getDefinitions();
    
    +++ b/core/modules/views/src/Plugin/ViewsHandlerManager.php
    @@ -133,4 +134,11 @@ public function getFallbackPluginId($plugin_id, array $configuration = []) {
    +  public function preWarm(): void {
    +    $this->getDefinitions();
    
    +++ b/core/modules/views/src/Plugin/ViewsPluginManager.php
    @@ -41,4 +42,11 @@ public function __construct($type, \Traversable $namespaces, CacheBackendInterfa
    +  public function preWarm(): void {
    +    $this->getDefinitions();
    

    I see a trait forming :-D

    Also, probably not RTBC with the text dump in the database backend...

  • πŸ‡¨πŸ‡³China g089h515r806

    Great work.

    I have a samll question:

    +      if (in_array($this->bin, ['cache_bootstrap', 'cache_data', 'cache_default', 'cache_discovery'])) {
    +        file_put_contents('/tmp/cache.txt', 'get: ' . $cid . "\n", FILE_APPEND);
    +      }
    +      Timer::start($cid);
    

    For '/tmp/cache.txt' file directory.
    Does it support windows OS?

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

    The point is kinda moot because that is there to debug cache triggering. See the last line of my previous comment.

    That said for reference both yes and no. The directory separators likely would work. The bigger problem would be the existence and writ-ability of /tmp which assumes a UNIX FHS that would probably fail. If this was intended for real usage we'd use \Drupal::service('file_system')->getTempDirectory();.

  • last update about 2 years ago
    29,958 pass
  • πŸ‡¬πŸ‡§United Kingdom catch

    Did some testing with the destructable changes to make sure it's doing what it's supposed to.

    I applied 'debug.txt' - it's similar to the cache logging debug that keeps creeping into the patch above , but adds $_SERVER['REQUEST_METHOD'] and skips the timer.

    Then with Umami installed, I submitted the 'clear all caches' button on admin/config/development/performance (after deleting cache.txt to make sure it starts empty).

    Two resulting file attached - before.txt is with the debug and HEAD, after.txt is with the latest patch applied.

    With before.txt, you can see that the POST request already builds some caches - this is from route rebuilding mainly I think, then there's a hard switch to the GET request which then handles all the cache misses from then on.

    With the after, it is working! You can see that even after the GET request starts, the POST request is still cache prewarming.

    Then if you find something like 'entity_field_map', in before.txt it only appears as a cache get/set in the GET request. In after.txt', the get/set is in the POST request and the the GET request only has a cache get - so it's successfully prewarmed before the GET request needs it.

    I did some quick experimentation with ab, and at least on my machine, at anything more than ab -c2 -n2, I start getting page cache hits back. So the idea about hitting it with -c2 -n10 to get a proper stampede situation won't work with unmodified core. We could probably do something like artificially extend the time it takes to build entity_field_map or similar by adding usleep(500) on a cache miss, didn't try that yet.

    Re-uploading the #59 patch just without the cruft in it too.

    Here's a snapshot of after.txt, you can see the mix of POST and GET before the cache sets and gets.

    POST set: entity_bundle_field_definitions:node:article:en
    POST set: last_write_timestamp_cache_discovery
    POST get: entity_bundle_field_definitions:node:page:en
    POST set: entity_bundle_field_definitions:node:page:en
    POST set: last_write_timestamp_cache_discovery
    POST get: entity_bundle_field_definitions:node:recipe:en
    POST set: entity_bundle_field_definitions:node:recipe:en
    POST set: last_write_timestamp_cache_discovery
    POST set: views_data:en
    GET get: entity_type
    GET get: core.extension.list.theme
    POST set: views_data:block_content:en
    POST get: views_data:content_moderation_state:en
    GET get: theme.active_theme.claro
    POST set: views_data:content_moderation_state:en
    POST get: views_data:file_managed:en
    POST set: views_data:file_managed:en
    GET get: element_info_build:claro
    GET get: element_info
    POST get: views_data:media:en
    POST set: views_data:media:en
    GET set: element_info
    GET set: last_write_timestamp_cache_discovery
    GET set: element_info_build:claro
    GET set: last_write_timestamp_cache_discovery
    GET get: route_provider.route_load:        
    

    Unlike the early bootstrap stampede logic, this doesn't require fibers at all because there's no risk of recursion, so we could have done this years ago as soon as we added DestructableInterface.

    The trait is a good idea, will look at adding that with next round of changes. It'd be usable in contrib too.

  • Status changed to Needs work about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Marked πŸ“Œ Centralize/optimize stampede protection/locking (aka work while we sleep) Closed: duplicate as duplicate and crediting Fabianx - much of this is the same as what we came up with eight years ago!

    Untagging for needs manual testing since #63 demonstrates this working.

    1. +++ b/core/lib/Drupal/Core/Lock/LockBackendAbstract.php
      @@ -45,16 +45,27 @@ public function wait($name, $delay = 30) {
      +      // Check if we're executing inside a Fiber. If so, before sleeping,
      +      // suspend the fiber in case some other code can run in the meantime. By
      +      // the time that code has finished running, the lock may already be
      +      // available.
      +      // @see Drupal\Core\Prewarm\CachePrewarmer
      +      if (\Fiber::getCurrent() !== NULL) {
      +        \Fiber::suspend();
      +      }
      +      if ($this->lockMayBeAvailable($name)) {
      +        // No longer need to wait.
      +        return FALSE;
      +      }
      +      // If the lock is still not available, it's possible that the parent
      +      // process immediately resumed the Fiber we're running in, so sleep
      +      // to avoid a lock stampede.
      

      Self review - do we need to calculate how long we were suspended for and then do $delay - $time_suspended here?

    2. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmer.php
      @@ -0,0 +1,88 @@
      +    $candidates = $this->serviceIds;
      +    while ($candidates) {
      +      $key = array_rand($candidates);
      +      unset($candidates[$key]);
      

      We should move the randomization inline comment from preWarmOneCache() to the class overview so it covers both methods.

    3. +++ b/core/lib/Drupal/Core/PreWarm/PreWarmableInterface.php
      @@ -0,0 +1,32 @@
      +/**
      + * Interface for cache prewarmers.
      + *
      

      This is wrong, it's an interface for services with prewarmable caches.

    4. +++ b/core/modules/views/src/Plugin/ViewsHandlerManager.php
      @@ -133,4 +134,11 @@ public function getFallbackPluginId($plugin_id, array $configuration = []) {
       
      +  /**
      +   * {@inheritdoc}
      +   */
      +  public function preWarm(): void {
      +    $this->getDefinitions();
      +  }
      +
      

      Still need the trait.

  • Status changed to Needs review about 2 years ago
  • last update about 2 years ago
    Custom Commands Failed
  • πŸ‡¬πŸ‡§United Kingdom catch

    This should address everything except #65.1

    We might be able to address #65.1 by using the Timer class, but it adds a dependency to the trait.

  • last update about 2 years ago
    30,052 pass, 2 fail
  • πŸ‡¬πŸ‡§United Kingdom catch

    Making cspell happy.

  • last update about 2 years ago
    30,056 pass
  • πŸ‡¬πŸ‡§United Kingdom catch

    Random test failure.

    #65.1 I am still thinking about and might have an idea, but that's not related to the new API here, so reviews of everything else would be good.

  • Status changed to Needs work about 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Just RTBC πŸ“Œ Add PHP Fibers support to BigPipe RTBC which led me here. Think it would be nice to get this in before 10.2 launches. Can the change record be written? Then could mark RTBC.

  • πŸ‡ΊπŸ‡ΈUnited States bradjones1 Digital Nomad Life

    Is a CR the only thing that remains here? @catch marked this needs architectural review?

  • Status changed to Needs review about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Added a change record. Yeah I think given this adds an API it could use some review of whether the API is OK, @neclimdul started this but hasn't confirmed he's happy with it yet.

  • Status changed to RTBC about 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Have pinged @ neclimdul on slack, but to keep the issue from stalling going to mark. CR reads well, added 2 code blocks to just show some examples.

  • last update about 2 years ago
    30,208 pass
  • last update about 2 years ago
    30,362 pass
  • last update about 2 years ago
    30,360 pass
  • last update about 2 years ago
    Patch Failed to Apply
  • Status changed to Needs work about 2 years ago
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    This looks great, thanks for all the detailed information in the issue summary and comments.
    Took a while to read, but I learnt a lot, very appreciated.

    Looking at the patch, its a lot simpler than I was expecting after reading the issue, so that is always a great sign.

    I've really only got some style suggestions.

    I'm going to remove the tag.

    1. +++ b/core/lib/Drupal/Core/DrupalKernel.php
      @@ -701,7 +701,20 @@ public function handle(Request $request, $type = self::MAIN_REQUEST, $catch = TR
      +      $fiber = new \Fiber(function () use ($request, $type, $catch) {
      +        return $this->getHttpKernel()->handle($request, $type, $catch);
      ...
      +      $fiber->start();
      

      we can use an arrow function here to avoid the need for use

      
      $fiber = new \Fiber(fn () => $this->getHttpKernel()->handle($request, $type, $catch));
      
      
    2. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmer.php
      @@ -0,0 +1,106 @@
      +    while ($candidates) {
      +      $key = array_rand($candidates);
      

      micro-optimisation, but if we used shuffle instead, we'd not need to call array_rand in each iteration

      $candidates = $this->serviceIds;
      shuffle($candidates);
      foreach ($candidates as $serviceId) {
        $service = $this->classResolver->getInstanceFromDefinitions($serviceId);
        $service->preWarm();
      }
      
      
    3. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmerInterface.php
      @@ -0,0 +1,72 @@
      + * to be served faster, we want to divide up different cache building between
      

      %s/we want to divide up different cache building between/we want to divide up cache building between different
      ?

    4. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmerInterface.php
      @@ -0,0 +1,72 @@
      + * where any service can define itself as prewarmable with a common method to
      

      nit: I think we can split this with a full stop after PreWarmableInterface then Any and remove the where

    5. +++ b/core/lib/Drupal/Core/PreWarm/CachePreWarmerInterface.php
      @@ -0,0 +1,72 @@
      +  public function preWarmOneCache();
      ...
      +  public function preWarmAllCaches();
      

      we should probably add the :void here too

    6. +++ b/core/lib/Drupal/Core/Routing/RouteProvider.php
      @@ -225,14 +225,38 @@ public function preLoadRoutes($names) {
      +        if (\Fiber::getCurrent() !== NULL) {
      +          \Fiber::suspend();
      +        }
      +        // Check for the cache item again in case it was set while we
      +        // were suspended.
      

      we're getting some pretty deep nesting here.

      The only thing that happens outside the if code is

      $this->serializedRoutes += $routes;
      So perhaps we can bring that into some of the if clauses, return early and avoid the nesting?

  • last update about 2 years ago
    30,368 pass, 6 fail
  • Status changed to Needs review about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Moved to an MR - it also needed a re-roll for dictionary.txt changes.

    Review points all look good so have incorporated into the MR.

  • Status changed to Needs work about 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Have not reviewed but MR seems to have some test failures.

  • last update about 2 years ago
    Custom Commands Failed
  • last update about 2 years ago
    30,368 pass, 6 fail
  • last update about 2 years ago
    30,379 pass
  • Status changed to Needs review about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch
  • Status changed to RTBC about 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Appears points in #74 have been addressed.

  • Status changed to Needs work about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    I've added some questions / points to the MR to address.

  • last update about 2 years ago
    Build Successful
  • Status changed to Needs review about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Moved the post-response stuff to a follow-up, addressed some points, answered others.

  • πŸ‡¬πŸ‡§United Kingdom catch
  • last update about 2 years ago
    30,377 pass
  • Status changed to Needs work about 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Seems to have 1 thread open.

  • last update about 2 years ago
    30,391 pass, 3 fail
  • Status changed to Needs review about 2 years ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Removed the rebase/merge cruft.

  • Status changed to Needs work about 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Was going to check the threads but seems to have test failures. Reran too.

  • last update about 2 years ago
    30,405 pass, 3 fail
  • πŸ‡¬πŸ‡§United Kingdom catch

    Just seen @donquixote's feedback four months later, wish d.o would update issues when there are comments on merge requests, but I guess we just need the gitlab issues migration to happen.

    I've repled to a couple, also going to postpone this on 🌱 Adopt the Revolt event loop for async task orchestration Active - it is not really hard-blocked, but if we're going to do that, it makes sense to build this on top of that API and I know @KingDutch is already looking at doing that.

  • First commit to issue fork.
  • Pipeline finished with Failed
    over 1 year ago
    Total: 179s
    #118485
  • Pipeline finished with Failed
    over 1 year ago
    Total: 3781s
    #118533
  • Pipeline finished with Failed
    about 1 year ago
    Total: 458s
    #319188
  • Pipeline finished with Canceled
    about 1 year ago
    Total: 599s
    #319194
  • πŸ‡¬πŸ‡§United Kingdom catch

    I've revived the original MR here but without any Fibers or revolt support - just adding the prewarm API itself and the implementations.

    I think we should then move the revolt implementation into a new issue - it would depend on this issue and the revolt library, but be a much smaller MR to review.

    Applied a couple of suggestions on the MR but haven't addressed all the feedback yet.

  • πŸ‡¬πŸ‡§United Kingdom catch
  • Pipeline finished with Success
    about 1 year ago
    Total: 454s
    #319210
  • πŸ‡¬πŸ‡§United Kingdom catch

    I've opened πŸ“Œ Use revolt to prewarm caches during lock waits Active for the revolt implementation. And rebased the original MR here to neither use raw Fibers nor revolt - it now just adds the basic API and prewarm implementations without trying to integrate with anything. Hoping that will make each issue a bit easier to review.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Another reason things to split things here - we may want to add the new interface, and maybe the implementations, to 10.4, so that contrib can implement its own prewarming with the interface in place, but we might not want to backport all of the actual async work to 10.4 given the main consumer will be core for quite a while.

  • Pipeline finished with Canceled
    about 1 year ago
    Total: 2485s
    #319707
  • Pipeline finished with Success
    about 1 year ago
    Total: 462s
    #319741
  • Pipeline finished with Failed
    about 1 year ago
    Total: 1033s
    #319759
  • πŸ‡ΊπŸ‡ΈUnited States bradjones1 Digital Nomad Life

    I'm a little confused about your note in #92 about fibers - the request handling is still wrapped in a Fiber, but this is only to provide some back-up functionality if there is a suspended fiber in the code path? You mentioned removing Fiber-related code so just trying to figure out where this landed.

  • Pipeline finished with Failed
    about 1 year ago
    #320403
  • πŸ‡¬πŸ‡§United Kingdom catch

    re #93, that was an oversight., removed that hunk too - similar logic will be handled by the revolt event loop issue.

  • Pipeline finished with Success
    about 1 year ago
    Total: 1200s
    #320540
  • Pipeline finished with Canceled
    10 months ago
    Total: 650s
    #383092
  • Pipeline finished with Success
    10 months ago
    Total: 1212s
    #383100
  • Pipeline finished with Canceled
    10 months ago
    Total: 204s
    #383109
  • Status changed to Needs review 10 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Tried to address all the feedback on the MR and added a kernel test.

  • πŸ‡¬πŸ‡§United Kingdom catch

    catch β†’ changed the visibility of the branch 3257725-revolt-cache-prewarm to hidden.

  • Pipeline finished with Success
    10 months ago
    Total: 1293s
    #383112
  • Pipeline finished with Success
    10 months ago
    Total: 483s
    #383190
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    MR generally looks good to me, but after reading the interface docs, I'm still not sure if people will fully know what to do when they need prewarmable dependencies in their prewarmable service. Maybe a code example could help here?

    E.g. Service A is required by service B, both can be prewarmed. Due to the randomization, service B is requested first. You mention Fibers in the docs, perhaps showing people how to properly call for A from B using Fibers could be a good code example?

  • πŸ‡¬πŸ‡§United Kingdom catch

    E.g. Service A is required by service B, both can be prewarmed. Due to the randomization, service B is requested first.

    In this case, service A would call service B, and warm the caches of both - unless another request has already warmed the cache for service B before it gets there, in which case it will get a cache hit. There's nothing special to do.

    Fibers won't help here unless something was happening async, but even then it's not strictly relevant to the double-cache-warm situation.

    Not sure how best to reflect this in the docs.

  • Pipeline finished with Success
    10 months ago
    Total: 596s
    #388653
  • The Needs Review Queue Bot β†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Merged in 11.x (rebase was not happy at all).

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    To me this is good to go but let's give it 48 more hours for @johnwebdev to reply? I can see their point that randomizing the services in the constructor means we cannot know what the order is from within a test. So, for instance, we can't test whether the preWarmOneCache() method really grabs the last one.

    The question is, do we have to?

  • πŸ‡¨πŸ‡¦Canada gapple

    I think the order the provided services are warmed is an internal detail that shouldn't be tested, just that one or all are warmed as expected, and no caches are warmed twice.

  • Pipeline finished with Failed
    8 months ago
    Total: 159s
    #434065
  • Pipeline finished with Failed
    8 months ago
    Total: 725s
    #434076
  • Pipeline finished with Failed
    8 months ago
    Total: 123622s
    #434078
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Running tests again as something seems to have gone wrong. If they go green I'm fine to RTBC. The final two threads on the MR are more discussion rather than requested changes.

  • Pipeline finished with Failed
    8 months ago
    Total: 561s
    #435520
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    1) /builds/issue/drupal-3257725/core/tests/Drupal/Tests/Core/PreWarm/CachePreWarmerTest.php:59
    Creation of dynamic property MockObject_PreWarmableInterface_db43a4f0::$warmed is deprecated

  • πŸ‡¬πŸ‡§United Kingdom catch

    Can't think of a way to workaround the dynamic property issue when using a mock of the interface, but we could define a real class and use that in the test instead?

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Agreed, a test class with a public property would work here.

  • Pipeline finished with Success
    8 months ago
    Total: 2081s
    #440669
  • πŸ‡¨πŸ‡¦Canada gapple

    I changed to use SplObjectStorage for tracking calls to each mock service.

  • Pipeline finished with Success
    8 months ago
    Total: 491s
    #441331
  • The Needs Review Queue Bot β†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • πŸ‡¨πŸ‡¦Canada gapple

    Merged 11.x

  • Pipeline finished with Failed
    8 months ago
    Total: 470s
    #442532
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Seems to fail on random stuff, running tests again.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Seems to fail on random stuff, running tests again.

    We have to be careful on this one because doesn't this introduce some randomness into how caches are rebuilt?

  • πŸ‡¬πŸ‡§United Kingdom catch

    We have to be careful on this one because doesn't this introduce some randomness into how caches are rebuilt?

    It will eventually, but the current MR only adds the API and not any actual cache prewarming anywhere.

    πŸ“Œ Use revolt to prewarm caches during lock waits Active is where all the indeterminacy will come in. πŸ“Œ [PP-1] Add post-response cache prewarming after full cache clears Needs work would also add some.

  • πŸ‡³πŸ‡±Netherlands daffie

    I have read the IS and most of the comments on this issue.
    We are only adding the CachePreWarming service, but we are not going to use it.
    The solution in the MR looks like the right one. Only as we are not going to use it we cannot be sure that it is the right solution.
    When we are going to use the service we might find that this solution is not the right one and we need to change it.
    Therefore it would be sensible to mark the new service as @internal. Changing the status to NW for this.

    Everything else is to me RTBC.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Earlier iterations of this MR did have usages of the new system but I split them out to their own issues to try to keep this one reviewable.

    I've added @internal to both the service class and CachePreWarmerInterface.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    LGTM and I agree with @daffie that we want to preserve the right to make changes as we get more metrics. Test changes also make sense.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Committed 9525aa6 and pushed to 11.x. Thanks!

  • πŸ‡·πŸ‡ΊRussia Chi

    It worth noting that prewarming data cached in APCu backend does not make mush sense.

  • πŸ‡¬πŸ‡§United Kingdom catch

    @chi almost nothing is stored purely in the apcu backend, it is mostly in chained fast which combines apcu and a consistent backend (database, redis, memcache). The one exception is the file cache but that is a level below all the things that can be prewarmed as a result of this MR.

  • πŸ‡·πŸ‡ΊRussia Chi

    Right. That note was mostly for custom projects that override cache backends like follows.

    $settings['cache']['bins']['bootstrap'] = 'cache.backend.apcu';
    $settings['cache']['bins']['config'] = 'cache.backend.apcu';
    $settings['cache']['bins']['container'] = 'cache.backend.apcu';
    $settings['cache']['bins']['discovery'] = 'cache.backend.apcu';
    $settings['cache']['bins']['menu'] = 'cache.backend.apcu';
    $settings['cache']['bins']['toolbar'] = 'cache.backend.apcu';
    
  • πŸ‡¨πŸ‡¦Canada gapple

    Would prewarming eventually be a replacement for hook_rebuild() when it's no longer @internal, or will they have different use cases?

  • πŸ‡¬πŸ‡§United Kingdom catch

    @gapple if hook_rebuild() is being used for pure cache pre-warming (e.g. something that will still be populated on a cache miss) then yes it could switch to this.

    The differences will be the drush parallel cache warming, and if there's a stampede, the ability to warm different caches at random during lock wait or while waiting for async database queries to come back. Whereas hook_rebuild() has to go linerlarly through whatever is put in it.

    None of these are implemented yet but see related issues.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024