Merging cache metadata is slow with thousands of tags added.

Created on 19 October 2023, over 1 year ago

Problem/Motivation

The situation where I see merging taking a long time is huge GraphQL requests (thousands of entities using https://www.drupal.org/project/graphql ).

When GraphQL resolves fields they can add cache metadata to the object that holds metadata for the whole request, and a lot of tags can be added multiple times.
Every call to addCacheTags performs merging of these tags, so if there are already a few thousand tags each added pack of tags will call array_unique and array_merge with thousands of values.

In one of my tests, Cache::mergeTags was called 71503 times and the average amount of tags in these calls was 2099.

I would like to hear if Drupal itself can benefit from optimizing merge, I'm not sure if this is only a GraphQL issue or maybe just my edge case.

Added patch for sure will fail tests and included just if someone would like to do quick profiling.
In my case, it reduces this query https://pastebin.com/ULmCsr3L from 12-13s to 4-5s. (Without Xdebug/profiler, the profiler doesn't show the same relative difference).
Also in the patch I've changed \Drupal\Core\Access\AccessResult::inheritCacheability, Cache::mergeMaxAges call was redundant since it is already done by addCacheableDependency. It seems to be a safe change and can be a different issue, although it doesn't change much in terms of performance.

Steps to reproduce

  • Create CacheableMetadata object
  • Keep adding tags to it
  • Profile

Proposed resolution

Change addCacheTags/Contexts so that merge is only performed when needed, for example only when calling getCacheTags/Contexts

Remaining tasks

User interface changes

None

API changes

TBD

Data model changes

-

Release notes snippet

-

📌 Task
Status

Active

Version

11.0 🔥

Component
Cache 

Last updated 1 day ago

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @reinfate
  • last update over 1 year ago
    Custom Commands Failed
  • last update over 1 year ago
    Build Successful
  • Status changed to Needs review over 1 year ago
  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States smustgrave

    Seems to have caused test failures.

  • Pipeline finished with Failed
    2 months ago
    Total: 138s
    #342068
  • So another attempt for this.

    Here is simple reproduction

        $markups = array_fill(0, 20000, 'markups');
        $build = [];
        foreach ($markups as $i => $markup) {
          $build[] = [
            '#markup' => $markup,
            '#cache' => [
              'tags' => [(string) $i],
            ],
          ];
        }
        $render = \Drupal::service('renderer')->render($build);
        return new Response($render);
    

    Rendering this with #cache takes about 5-8 times longer on my machine than without them
    Mainly due to array_unique being called 20_000 times on the array that is growing from 1 tag to 20_000 tags

    Changing how public methods in Cache and Cacheable object work (to avoid calling array_unique on each cache tag merge) will inevitably fail tests and potentially break some user code.
    But that said, I've looked that each CacheBackend from core does another array_unique call before writing cache, so I would guess that even if we remove array_unique call in \Drupal\Core\Cache\Cache merge methods, everything still should work.

    But to be safe I've added additional mergeFast methods in the Cache-related classes and made use of them in the Renderer.
    Attached screenshot is the profiler isolated to \Drupal::service('renderer')->render($build); call from the example

    Lets see if this fail tests

  • Pipeline finished with Failed
    2 months ago
    Total: 660s
    #342078
  • Pipeline finished with Failed
    2 months ago
    Total: 507s
    #342119
  • So seems like only \Drupal\Tests\views\Kernel\Handler\FieldRenderedEntityTest::testRenderedEntityWithoutAndWithField failed.
    The problem with this test is that it checks that #cache tags and contexts are exact arrays from \Drupal\Tests\views\Kernel\Handler\FieldRenderedEntityTest::assertCacheabilityMetadata. But these render arrays do not have the #cache['keys'] so updated renderer do not remove duplicates from the cache metadata (as it assumes element will not be cached anyway)
    So either we can modify the test, or modify Renderer/RenderContext to apply cache to the element when RenderContext size is 1 (as it indicates that it is the top-most element for the current render() call)

    Moving to needs review, since I would like to get feedback on this whole issue before proceeding further.

  • 🇨🇭Switzerland berdir Switzerland

    First, I don't think the cache tags system was ever meant to handle thousands of them. Merging is one thing, but it also implies that checking them has to do queries with thousands of placeholders, which generally doesn't seem like a great idea.

    At that scale, it might be better to just switch to list cache tags?

    That's even more true with cache contexts, there really should only be a handful of those. I was just thinking about when I once accidentally implemented a cache context with high variation. The problem with contexts is that they are sticky, so if you for example add a cache context with the node id as parameter and add that to the breadcrumb block, over time you will have thousands of unique cache contexts there.

    That said, I've definitely also seen considerable overhead of merging cacheability but it also tends to be blown up by xhprof/blackfire as that tends to add a lot of overhead to many but fast function calls. For example on the permissions page. So looking into improving that is definitely worth a try.

  • 🇬🇧United Kingdom catch

    That's even more true with cache contexts, there really should only be a handful of those. I was just thinking about when I once accidentally implemented a cache context with high variation. The problem with contexts is that they are sticky, so if you for example add a cache context with the node id as parameter and add that to the breadcrumb block, over time you will have thousands of unique cache contexts there.

    I think this is fixed in VariationCache now, although I can't remember exactly how it was done...

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    Okay so let's start with the positives. I like how you:

    1. Managed to profile this all the way down to the merging of cache tags
    2. Considered that a render array without cache keys won't be cached to we can skip some processing there

    The second point is something we should take into separate consideration to see if we can't optimize more for render arrays parts that won't be cached.

    Having said that, there's some more light to be shed on the issue at hand. Let's get the obvious out of the way first: If you ever have a render array or CacheRedirect with thousands of cache contexts, you are doing something wrong as your whole page would not only become uncacheable, but actually slower.

    Likewise, having thousands of cache tags is not going to be doing you any favors either and at that point you're probably better off thinking of other approaches. So it makes absolute sense for the current Cache class to call array_unique the way it does, because the expectation was never that it would be called thousands of times on large data sets during a single request.

    Now, this leaves one use case we might want to address where you still run into the issue you have here: You are adding only a handful of cache tags, but you're adding them over and over and over again. To me, this is a red flag and something that probably doesn't happen all that often.

    I'm inclined to say that we shouldn't fix this in core, but rather have GraphQL consider the fact that it's manipulating cacheable metadata thousands of times during a request and that it should perhaps optimize for that. This could be as simple as a custom CacheableMetadata object that incorporates the logic from the MR and at the very end of "adding fields", adds that objects metadata to the actual response once.

    So the MR in its current shape doesn't quite cut it for me, personally. I'd rather see us revisit what we do with cacheable metadata during the render process, as demonstrated in the MR, but I'm not sure that would be such a performance gain. A great first step here would be to use the PerformanceTestTrait to gain some metrics on what is currently happening and then see how much it improves if we change the renderer.

    Keep in mind that the renderer is a crucial part of Drupal, so we shouldn't be changing it lightly without enough evidence to back up the necessity of the change. We should also make sure we have ample tests covering us when we do make changes.

    Finally, we could also consider changing the RefinableCacheableDependencyTrait and CacheableDependencyTrait in a way that it only calls array_unique on a get, but only until a new set has been detected. That would probably be a net gain across the board, but requires us to refactor how both traits interact with one another as they would need to share some information between them.

    This avenue is worth exploring and I would definitely support that change, but not until we have above profiling in place first to see if it even makes sense to carry out.

    The code we currently have in StandardPerformanceTest looks like this:

        $recorded_queries = $performance_data->getQueries();
        $this->assertSame($expected_queries, $recorded_queries);
        $this->assertSame(34, $performance_data->getQueryCount());
        $this->assertSame(124, $performance_data->getCacheGetCount());
        $this->assertSame(45, $performance_data->getCacheSetCount());
        $this->assertSame(0, $performance_data->getCacheDeleteCount());
        $this->assertSame(36, $performance_data->getCacheTagChecksumCount());
        $this->assertSame(43, $performance_data->getCacheTagIsValidCount());
        $this->assertSame(0, $performance_data->getCacheTagInvalidationCount());
    

    We should probably add some tracking to the traits mentioned above and get some metrics on those too. It might not cover every cache tag manipulation as we're not guaranteed that everyone uses those traits, but most classes do.

  • 🇧🇪Belgium kristiaanvandeneynde Antwerp, Belgium

    To clarify: I don't think we need permanent coverage of the adding and getting of cache tags, as that would be rather pointless and hard to pull off as we can't decorate a trait. But for the sake of this issue, we could hack into the traits, add some coverage and see how many times we call array_unique in there.

    Then create a second MR with both the coverage and the fix to see the difference and finally create an MR with just the fix.

Production build 0.71.5 2024