Problem/Motivation
Drupal's completely cold cache performance has several problems:
- We have multiple registries/collections that have to be rebuilt from either YAML or annotation parsing + hooks/events before any HTML request and most REST requests can be served successfully. For example the container, router, theme registry, element info, plugins. These can take hundreds of milliseconds, or in some cases several seconds each.
- On actual sites, there will be common page elements like menus, footers etc. which at least one page must build from scratch before any page can be served and the response sent. Also asset aggregates
π
Stampedes and cold cache performance issues with css/js aggregation
Fixed
In Drupal 6/7 this has two manifestations:
- in earlier releases of core, and in many contrib modules, we would get stampedes with multiple requests rebuilding exactly the same information.
- due to this we've added the lock API, so that one process builds the expensive cache item, while the rest sleep() until it's there.
However, it is still very easy for sites to run into situations such as the following:
On low traffic sites, operations like enabling a module, changing the default theme, submitting a view, can take multiple seconds to complete.
On high traffic sites, in addition to the above, events such as code deployments can result in sites becoming unresponsive for up to a minute - as every incoming request is held waiting for 5-6 expensive cache items to be rebuilt sequentially, then may further have to build expensive page elements after that. While this is going on, apache clients build up, since none of them can send a response and close the connection. Drupal's high memory usage means the number of apache clients generally needs to be kept quite low per-server so reaching max-clients (or having to configure varnish to limit connections so it doesn't get reached) is very common.
Proposed resolution
Speaking to Fabianx yesterday he had an idea about parallel processing of blocks using PHP 5.5 generators and APC locks (get a list of blocks, if cached, serve it, if not cached, try to acquire a lock (in APCu) and rebuild it, if the lock can't be acquired, move on to the next block and come back for it later on the assumption another process might have built it by then). We didn't discuss this approach for other kinds of caches, but I think it's equally or more applicable here. Then later discussed cold cache performance with effulgentsia and whether there was a way to make things more robust while individual cache items remain expensive to build, then thought of this.
For the central cache items, we'd use the regular lock API (since we know these are global caches and the number of locks will be very small), and there's no need for generators just need a list of services and iterate over them.
While the router, theme registry, element info cache generally get accessed in a particular sequential order (at least on the same site), there are not really interdependencies between them - so the order they get built in shouldn't matter except for the ordering of our bootstrap, and particular interdependencies when the version of a cache depends on previous steps in the request.
The idea would be:
1. Any 'important and expensive' service such as routing, theme registry, element info cache implements an interface and tags itself (or we add adapters for this purpose).
Something like this:
<?php
Interface StampedeRebuildInterface {
public function isRebuildNeeded();
public function acquirelock();
public function doRebuild();
}
2. We add a stampede.protection.rebuild service (better names welcome), that can iterate over the tagged services, checks if they need rebuilding, tries to acquire a lock, rebuilds if it can, moves on to the next if it can't.
Examples of how this would change things.
Let's say we have three items (router, theme registry, element info) and each takes 3 seconds. In reality we have more things and they can take from 100ms to 10 seconds depending on server/site.
Before:
Process 1:
lock_acquire('router') -> router_rebuild() -> lock_acquire('theme_registry') -> theme_registry_rebuild() -> lock_acquire('element_info') -> element_info_rebuild()
3 + 3 + 3 =9 seconds
Process 2:
!lock_acquire('router') -> lock_wait('router') -> !lock_acquire('theme_registry') -> lock_wait('theme_registry') -> lock_acquire('element_info') -> lock_wait('element_info');
3 + 3 + 3 = 9 seconds
Process 3: lock_wait() lock_wait() lock_wait() blah blah blah
3 + 3+ 3 = 9 seconds
After:
Process 1:
lock_acquire('router') -> router_rebuild() -> HIT: theme registry -> HIT element_info()
3 + 0 + 0 = 3 seconds
Process 2 ->
!lock_acquire('router') -> lock_acquire('theme_registry') -> theme_registry_rebuild() -> HIT router -> HIT element_info()
0 + 3 + 0 = 3 seconds
Process 3 ->
!lock_acquire('router') -> !lock_acquire('theme_registry') -> lock_acquire('element_info') -> element_info_rebuild() -> HIT router -> HIT theme_registry()
0 + 0 + 3 = 3 seconds.
One limitation is that the services can't rely on the request. So for example the theme registry we can't know the active theme until we have a route. However we can build the theme registry for the default theme - and in the process of that a theme-independent cache item gets stored (theme_registry:build:modules
) so it still lets us do the bulk of the work in a request-agnostic way. All we're doing is literally replacing time when the process would be sleeping with rebuilding caches that will be needed later in the same request, and in other requests that are coming in.
Remaining tasks
User interface changes
API changes
(It is possible without)