VariationCache needs to be more defensive about cache context manipulation to avoid broken redirects

Created on 3 June 2024, 25 days ago
Updated 13 June 2024, 15 days ago

Problem/Motivation

If people get a render array or cacheable code flow where they can add to a parent and in their child specify cache contexts, those will accumulate and VariationCache will work by virtue of cache redirects.

However, I've recently seen core code where parent cache contexts were left out and/or manually optimized and by happy accident this hasn't caused any issues yet, but if a longstanding bugfix ( 📌 Cacheability information from route access checker access results are ignored by dynamic_page_cache Needs review ) gets in we might all of a sudden see bug reports and perhaps even security issues.

We should make VariationCache more defensive by checking for all previously set cache contexts every step of a redirect chain. If one is missing, we should immediately throw an exception and bail out. This should catch a lot of insecurely written code that doesn't properly set the right cache contexts by letting people know there is something wrong with their code.

Steps to reproduce

Try the following render array:

// This is the starting point.
$build['parent'] = [
  '#markup' => $user->hasPermission('foo') ? 'You can foo!' : 'You cannot foo',
  '#cache' => ['contexts' => ['user.permissions']],
];

// This is in a place where the original build can be altered.
if ($user->hasPermission('foo')) {
  $build['parent']['child']['#cache']['contexts'] = ['user'];
  $build['parent']['child']['#markup'] = $user->id() === 13 ? 'You are 13!' : 'You are not 13';
}

This is a horrible example, but it works fine. Now imagine if in that alter step they tried to be clever and said: "Oh hey, user and user.permissions fold into user, so let's do that here already".

if ($user->hasPermission('foo')) {
  // Trying to be smart here by optimizing parent.
  $build['parent']['#cache']['contexts'] = ['user'];
  $build['parent']['child']['#markup'] = $user->id() === 13 ? 'You are 13!' : 'You are not 13';
}

The above would cause a redirect to be written for 'user' a user with permission 'foo' came first and for 'user.permissions' if anyone else came first. Whereas the actual redirects should be: user.permissions for everyone + an extra redirect beyond the original one to user for users who have the 'foo' permission.

VariationCache would choke here because it tries to heal ['user'] vs ['user.permissions'] but they have nothing in common. ['user'] and ['user', 'user.permissions'] does and would properly apply self-healing and create the second redirect.

A more practical example can be found in UserAccessControlHandler::checkAccess() for the view operation. I have yet to file a bug report for that, but it applies the flawed logic above where it doesn't always add the user.permissions cache context even though it should.

        if ($account->hasPermission('access user profiles') && $entity->isActive()) {
          return AccessResult::allowed()->cachePerPermissions()->addCacheableDependency($entity);
        }
// AT THIS POINT EVERYTHING SHOULD VARY BY USER.PERMISSIONS, YET DOESN'T.
        // Users can view own profiles at all times.
        elseif ($account->id() == $entity->id()) {
          return AccessResult::allowed()->cachePerUser();
        }
// AT THIS POINT EVERYTHING SHOULD ALSO VARY BY USER, YET DOESN'T.
        else {
          return AccessResultNeutral::neutral("The 'access user profiles' permission is required and the user must be active.")->cachePerPermissions()->addCacheableDependency($entity);
        }

As explained in the issue where I found this, this could lead to a person with the same permissions as the profile owner to lock the owner out of their profile if said non-owner came first and got an access denied which then gets cached per permissions. If this access check were to trigger for the owner first and then for a non-owner, we'd get the broken redirects described above.

Proposed resolution

Add a sanity check to VariationCache to enforce that any previous cache contexts up until the point where we want to store something are present. This point can be right after the initial get, but equally after a few redirects. That entire chain must be represented by any subsequent CacheRedirect that is being stored.

Remaining tasks

Fix VariationCache and write tests

User interface changes

/

API changes

Broken code will now get an exception, which is good.

Data model changes

/

Release notes snippet

/

🐛 Bug report
Status

Needs review

Version

11.0 🔥

Component
Cache 

Last updated about 17 hours ago

Created by

🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @kristiaanvandeneynde
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Marked as major because we have a few sleeper bugs in core that this might uncover.

  • Pipeline finished with Failed
    25 days ago
    Total: 133s
    #189785
  • Pipeline finished with Failed
    25 days ago
    Total: 164s
    #189784
  • Pipeline finished with Success
    25 days ago
    Total: 602s
    #189786
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    MR needs work, first item in the chain is always the stored item or a redirect, so we can't have this "we already checked initially" stuff in there. Working on it.

  • Pipeline finished with Failed
    25 days ago
    Total: 538s
    #189839
  • Issue was unassigned.
  • Status changed to Needs review 25 days ago
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Okay tests have thorough coverage now. Let's see if this breaks core and why.

  • Pipeline finished with Failed
    25 days ago
    Total: 540s
    #189856
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Oof, I was afraid we'd see this many test fails. Let's start digging, I suppose.

  • Status changed to Needs work 25 days ago
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    CommentNonNodeTest message was:
    Trying to set a redirect at an address that already had a redirect with nothing in common, old redirect at address "languages:language_interface, theme, user.permissions" was pointing to "url.path.parent, url.path.is_front", new redirect at same address points to "route".

    Sucks how browser tests obscure these, so far I always put the following in these tests so I'd happily learn about a better way of debugging this:

        $this->assertEquals('foo', $this->getSession()->getPage()->getContent());
    
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Okay so figured out where these are coming from: CommentBreadcrumbBuilder::build() and PathBasedBreadcrumbBuilder::build(). It seems like either one or the other fires, but not both, leading to the exception we introduced here as the same cache ID now triggers a completely different redirect.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Yup, from CommentBreadcrumbBuilder:

      public function applies(RouteMatchInterface $route_match) {
        return $route_match->getRouteName() == 'comment.reply' && $route_match->getParameter('entity');
      }
    

    This conditional applies return value is never represented by the 'route' cache context when the result is FALSE.

    I'm sure we'll find plenty similar core bugs while combing over the test fails here.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Found an issue that reports this, moving on.

  • Pipeline finished with Failed
    24 days ago
    Total: 617s
    #190047
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Seems like the other bug is with jsonapi:

    Trying to set a redirect at an address that already had a redirect with nothing in common, old redirect at address "url.site, route, request_format" was pointing to "url.query_args", new redirect at same address points to "url.query_args:fields, url.query_args:include, user.permissions".
    

    The data it was trying to set:

    \Drupal\Core\Cache\CacheableResponse::__set_state(array(
       'headers' => 
      \Symfony\Component\HttpFoundation\ResponseHeaderBag::__set_state(array(
         'headers' => 
        array (
          'cache-control' => 
          array (
            0 => 'no-cache, private',
          ),
          'date' => 
          array (
            0 => 'Mon, 03 Jun 2024 16:26:31 GMT',
          ),
          'content-type' => 
          array (
            0 => 'application/vnd.api+json',
          ),
        ),
         'cacheControl' => 
        array (
        ),
         'computedCacheControl' => 
        array (
          'no-cache' => true,
          'private' => true,
        ),
         'cookies' => 
        array (
        ),
         'headerNames' => 
        array (
          'cache-control' => 'Cache-Control',
          'date' => 'Date',
          'content-type' => 'Content-Type',
        ),
      )),
       'content' => '{"jsonapi":{"version":"1.0","meta":{"links":{"self":{"href":"http:\\/\\/jsonapi.org\\/format\\/1.0\\/"}}}},"data":{"type":"entity_view_display--entity_view_display","id":"643f166a-a78e-4bad-941c-493d34d10dff","links":{"self":{"href":"http:\\/\\/web\\/jsonapi\\/entity_view_display\\/entity_view_display\\/643f166a-a78e-4bad-941c-493d34d10dff"}},"attributes":{"langcode":"en","status":true,"dependencies":{"config":["node.type.camelids"],"module":["layout_builder","user"]},"third_party_settings":{"layout_builder":{"enabled":true,"allow_custom":true}},"drupal_internal__id":"node.camelids.default","targetEntityType":"node","bundle":"camelids","mode":"default","content":{"links":{"settings":[],"third_party_settings":[],"weight":100,"region":"content"}},"hidden":{"layout_builder__layout":true}}},"links":{"self":{"href":"http:\\/\\/web\\/jsonapi\\/entity_view_display\\/entity_view_display\\/643f166a-a78e-4bad-941c-493d34d10dff"}}}',
       'version' => '1.0',
       'statusCode' => 200,
       'statusText' => 'OK',
       'charset' => NULL,
       'cacheabilityMetadata' => 
      \Drupal\Core\Cache\CacheableMetadata::__set_state(array(
         'cacheContexts' => 
        array (
          0 => 'url.query_args:fields',
          1 => 'url.query_args:include',
          2 => 'url.site',
          3 => 'user.permissions',
        ),
         'cacheTags' => 
        array (
          0 => 'config:core.entity_view_display.node.camelids.default',
        ),
         'cacheMaxAge' => -1,
      )),
    ))'

    Will try to debug that one further when I have time, but that might be Thursday at the earliest.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    These are coming from JsonApiRequestValidator::validateQueryParams() and EntityResource:buildWrappedResponse() in that order. So probably also some check not making it into the cache contexts.

  • 🇭🇺Hungary mxr576 Hungary

    ... defensive by checking for all previously set cache contexts every step of a redirect chain. If one is missing, we should immediately throw an exception and bail out. This should catch a lot of insecurely written code that doesn't properly set the right cache contexts by letting people know there is something wrong with their code.

    Could this be also achieved/enforced by PHPStan (Drupal) too? (cc @mglaman)

  • 🇬🇧United Kingdom joachim

    I think I've figured out (some of) the jsonapi module failures -- will work more on them tomorrow.

  • 🇬🇧United Kingdom joachim

    I got as far as Drupal\jsonapi\Access\EntityAccessChecker::checkEntityAccess() where we have this:

          $access = AccessResult::neutral()->addCacheContexts(['url.query_args:' . JsonApiSpec::VERSION_QUERY_PARAMETER])->orIf($access);
    

    with no user.roles context.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Current findings:

    1. DPC sets the initial cache contexts of 'route' and 'request_format'
    2. This ends up at a redirect to 'url.site', set by JsonApiDocumentTopLevelNormalizer::normalize()

    Then, the first time we try to store the return value of JsonApiRequestValidator::validateQueryParams(), leading to the creation of a second CacheRedirect, pointing to 'url.query_args'

    But the second time, we try to store the return value of EntityResource::buildWrappedResponse(), which adds 'url.query_args:fields' and 'url.query_args:include' and somewhere along the line 'user.permissions' also makes it in there. Probably an access check, but have to figure that one out.

    So the redirect at address response:[request_format]=api_json:[route]=jsonapi.entity_view_display--entity_view_display.individual_SOME_HASH_1:[url.site]=http://web used to point at 'url.query_args', but now points to those new cache contexts leading to the final cache ID of:

    response:[request_format]=api_json:[route]=jsonapi.entity_view_display--entity_view_display.individual_SOME_HASH_1:[url.query_args:fields]=:[url.query_args:include]=:[url.site]=http://web:[user.permissions]=_SOME_HASH_2

    This means that whatever code comes after JsonApiDocumentTopLevelNormalizer::normalize() is currently inconsistent in setting cache contexts.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Fixed it. JsonApiRequestValidator::validateQueryParams() only sets url.query_args when validation fails, but obviously it also needs to set that when validation succeeds. Because if the query args change, so can the validity.

  • Pipeline finished with Failed
    17 days ago
    Total: 557s
    #196267
  • Pipeline finished with Failed
    16 days ago
    Total: 573s
    #196970
  • Pipeline finished with Failed
    16 days ago
    Total: 606s
    #196998
  • Status changed to Needs review 16 days ago
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Okay so this shows that there's only two detectable bugs in core right now:

    CommentBreadcrumbBuilder::applies(), I "fixed" this the wrong way here to make tests go green. This caused StandardPerformanceTest to fail as there are now 2 more cache gets, which makes sense because we added a cache context. It also made phpunit tests report an unknown failure.

    JsonApiRequestValidator::validateQueryParams(), I fixed this properly here but it needs a dedicated issue. Will create that one next.

    There might be more core bugs, such as the one we just fixed in 🐛 Insufficient cacheability information bubbled up by UserAccessControlHandler Active , but we don't always have tests that are extensive enough for this VC fix to throw exceptions there. I.e.: They are flying under the radar and this exception will over time throw an exception on sites that have warmed their caches enough to run into the buggy code.

  • 🇭🇺Hungary mxr576 Hungary

    @kristiaanvandeneynde some tests are still failing as I can see, needs work? :-/

    https://git.drupalcode.org/issue/drupal-3452181/-/pipelines/197002/test_...

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    some tests are still failing as I can see, needs work?

    Those fails stem from my "fix" for breadcrumbs. The actual fails caused by the VC changes are all green now. If we split off the breadcrumb and jsonapi fix everything will be green.

  • Status changed to Needs work 15 days ago
  • The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • Status changed to Needs review 15 days ago
  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    I know it doesn't apply, that's the whole point :)

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium
Production build 0.69.0 2024