Condition plugins should provide cache contexts AND cacheability metadata needs to be exposed

Created on 15 November 2014, over 10 years ago

Updated 2 February 2023, over 2 years ago

Problem/Motivation

This issue blocks critical #2495171: Block access results' cacheability metadata is not applied to the render arrays →

Short version: Because conditions do not indicate their cacheability during access, block access results are not cacheable, which continues to bubble up.

If something (a block, or generally: anything that can be rendered) is rendered, then we need to know its cacheability metadata. We need cache tags to know when the cached version of the renderable becomes stale. And we need cache contexts to know which variations of the renderable exist.
Perhaps most counter-intuitively: checking access is part of the rendering process, because an inaccessible thing is not rendered. Not being rendered is equivalent with rendering to the empty string. If the cache tags associated with the access check become invalidated (e.g. the node:77 cache tag, node 77 is published, which suddenly makes the block with the node:77 accessible, unlike before), or a different value for the cache context is specified (e.g. a different set of permissions, because permissions have changed for a role), then that may make previously inaccessible (== empty string) things accessible (== non-empty string).

In summary: if something may be rendered, then we always need the associated cacheability metadata.

Applied to the conditions/contexts system/concepts: if you have a block with a "node type" visibility using a context (as in \Drupal\Core\Plugin\Context\ContextInterface) that depends on some external information, then we always need to know about the cache context (as in \Drupal\Core\Cache\CacheContextInterface) associated with that external information. Even if the block is currently not visible.
The block not being visible is analogous to access checking above: it causes the empty string to be rendered (== invisible block), but we still need the cacheability metadata associated with that choice to not render the block.

Think about it this way: we need to know the cacheability of the information that was used to determine whether the block was visible or not; otherwise Drupal would be forced to assume that no external data was involved in coming to the decision to make the block invisible: the absence of cache tags and cache contexts means there are no dependencies on either data (cache tags) or request context (cache contexts) to come to that conclusion.

The added test expectations in NodeBlockFunctionalTest reflect those requirements exemplary: The node route context depends on the route. Some routes have a node object, some don't. Once we have a block with a node type visibility condition, which needs a node context, we need to know that whether that specific block is visible or not, which depends on that visibility condition, which depends on the node context, which depends on the route. However, if no block visiblity depends on it, then the cache context for the route should not exist.

Problems/Blockers

The primary problem is that many parties/objects are involved in the flow with a lot of indirection until we can create and build an access result object for a block. Looking at the specific example above in HEAD:

1. BlockPageVariant invokes an event and asks the world to return available contexts.
2. (Problem) NodeRouteContext *sometimes* returns a Context object for the node. If it doesn't, then we have no knowledge that it could exist. We also don't know based on what conditions it exists.
3 (Problem) We must only add cache metadata for contexts that are actually used (i.e. a block must be consuming the context, note that 1 returns all available contexts). The node route context is always available on node pages, but we don't want to vary by that when nobody uses it.
4. BlockPageVariant then sets the contexts on the block entity, for the purpose of being able to pass them through to the block access control handler and then calls access, which calls just that.
5. BlockAccessControlHandler::checkAccess() fetches the contexts again and loops over the visibility conditions of a block. Each visibility condition is then assigned their contexts.
6. (Problem) The first problem there is that we *only* pass the context value of the available context object to the internal context object of the condition plugin.
7. (Problem) ContextHandler uses exceptions to flag configured but missing conditions or conditions without a value. That immediately aborts the access checks, the exception is caught, block access is denied and the block access result is marked as uncacheable. That's a huge problem for SmartCache/BigPipe (or render caching in general) because those scenarios are common. The frontpage view doesn't have a node from the route, so the node type visibility condition can't be checked and SmartCache can't cache the frontpage.
8. (Problem) If a block has multiple conditions and the first one is missing a context, then we don't know what other conditions/contexts might be involved. Render cache can to a certain degree learn this incrementally, but it's more expensive to do so. If you have a block that should only be shown on article nodes for administrators, then we'd like to know about all the relevant cache contexts (user roles and route).
9. (Problem) Required contexts are completely broken. The required flag is force-checked in getContextValue() (by throwing an exception), there is no way to check if a context has a value. Since ContextHandler calls getContextValue() on the *provided* node route context object, we actually check the required flag of that, which is always required (since that is the default value). That means it's impossible to have an optional node context in HEAD.
10. (Problem) We have the current user context, and we have the user role condition. The current user context varies by the user, the user role condition only wants to vary by user roles. But we don't know based on what user roles. We could also have a user route context and check a block on user/1 by whether the use we're viewing has a certain role.

(As you can see, we have many problems... )

Proposed resolution

I'll try to expand this later, for now, summarizing on how the patch aims to solve this in general and address all those blockers/problems.

1. The general idea is to use the context object to transport cacheability metadata through the access check process until we can add it to the access result. That by itself was already pretty clear to everyone from the first patch on. Just not the amount of changes are required to actually get that information in the end, some of them quite controversial.

2. As explained above, we always need to know that there could be a context that is based on something. So NodeRouteContext is changed to always return a context object that sometimes doesn't have a value. This is conceptually common for cache contexts, but there's even more indirection involved than usual. If you're conditionally adding something to a render array, then you add e.g. the user.permission context, check access and then add (or not) your stuff. But here we have something that might be used, and if, then it will have to vary by our cache context (or a subset of it). See next point.
3. To solve the problem of only getting cache metadata of actually used contexts, we fetch their cacheability info *through* the block and condition plugins indirectly. Which means those need to have this information in their context objects.
6. But as explained in this point, their context objects are actually completely separated from the provided contexts that have the metadata. So we need to pass it along. *Always*, if a context could possibly be injected, even if there's no value. So ContextValueHandler is refactored to explicitly pass along the cacheability metadata from the provided context to the internal plugin context.
7. But not having a context here breaks our ability to do so, obviously. That's why we now always return a context object. The patch doesn't yet get rid of the exception code, however.
8. The patch attempt to solve the problem of multiple conditions and contexts by not aborting early after an exception but try to process all conditions and collect their cacheability metadata.
9. This is a standalone bug that could possibly be extracted into a separate issue, but it's blocking this. This patch changes the code to no longer throw exceptions on getContextValue(). That allows us to check if there is a value and still pass along the cache metadata. As mentioned above, the bigger problem is actually that the required flag is checked on the wrong context object. A provided context can't be required, that's meaningless. It shouldn't even be possible to define it, but they're required by default because they use the same context definition objects.
10. This problem is solved in the patch for this use case by making the user role condition check for the current user context and if that is given, it's limited to current_user.roles. That unfortunately requires some hardcoded assumptions but the context system isn't even close in understanding that the user role condition doesn't need a user but just a list of user roles (which would involve this logic, which aren't able to determine either). It can't be solved in a generic way, at least not here.

Remaining tasks

1. Agree on the approach.
2. Possibly extract some of the controversial changes/separate bugfixes like the point 9 (required stuff) in separate issues, to discuss them in detail. This also needs tests to demonstrate the problem and avoid it in the future.
3. Avoid to mark blocks as uncacheable when contexts are missing. This does not prevent the smartcache patch from becoming green, but it highly limits it as soon as blocks that rely on possibly missing contexts exist. It also needs tests to demonstrate it. Which either actually requires smartcache to exist or another way to check whether a page had max-age: 0 cache metadata.
4. In a perfect world, get rid of the mis-use of exceptions for flow-control of expectable scenarios that should be explicitly checked instead.

User interface changes

None.

API changes

Event subscribers that expose blocks based on external conditions need to always return a context object with the cacheability metadata attached.
Modules that use condition plugins and blocks to display content (which is always the case for blocks but doesn't have to be for conditions) are required to collect cacheability metadata from them and add it to the resulting render arrays.

🐛 Bug report

Status

Fixed

Version

8.0 ⚰️

Component

Block →

Last updated 3 days ago