Page cache creates vast amounts of unneeded cache data

Created on 5 November 2018, over 5 years ago
Updated 19 January 2024, 5 months ago

We noticed on several of our sites that the page_cache table was steadily growing to hundreds of megabytes, containing thousands of rows. We didn't understand why as these sites were not very content heavy and should be very easily cacheable. The example site we used for debugging contains 589 pages and no authenticated users. So we expected to see 589 page_cache entries. Perhaps a few more caused by views pagers, exposed filters and so on. But certainly no more then 1.000.

Problem 1:
After some investigation we noticed a new page_cache entry was created for every unique URL, regardless wether or not that URL actually creates a different output... This is caused by adding query parameters to the URL.
As an example I created a simple bash script that requests url's with a variable id: http://example.tld/?id=x and loops over x from 1 to 10.000. Sure enough the page_cache generated 10.000 cache entries, resulting in 1.5GB of mysql data in the page_cache table.

This should not happen as the cache contexts for the response on this page does not contain the "url.query_args" context. So the cache system should know that the "id" query parameter does not result in any change and should not cause more then 1 entry in the cache_page.
Is this normal behaviour? I could find some related issues, but no clear description as to why this happens, nor a solution:
💬 Dynamic cache does not respect query parameters Closed: works as designed
#2062463: allow page cache cid to be alterable
#2662196: Cache route by Uri and not just Query+Path

I think this could potentially be exploited to overload websites or even crash mysql database or other caching software?

Problem 2:
If we would fix problem 1, a second problem still pops up. Because in our example site we have some modules active, some configuration, some blocks, some real content, ... Mainly a search block is placed in the header of our site on all pages. That in turn creates a form on our website and somehow that seems to add "url.query_args" to the cache context for all pages. So even if the cache id for the page_cache would filter on only query args passed in the cache context, this behaviour would still add all query parameters causing every query parameter to generate a new page_cache entry.
So we did some searching and found that Drupal core adds "url.query_args" in a few places where it should not be needed at all... See the listing below of what I could find.

Why is this implemented like this?
Probably lots of contrib modules already "rely" on this "bazooka"-behaviour...?

core/lib/Drupal/Core/Form/FormBuilder.php:

  /**
   * #lazy_builder callback; renders a form action URL.
   *
   * @return array
   *   A renderable array representing the form action.
   */
  public function renderPlaceholderFormAction() {
    return [
      '#type' => 'markup',
      '#markup' => $this->buildFormAction(),
      '#cache' => ['contexts' => ['url.path', 'url.query_args']],
    ];
  }

modules/user/src/Form/UserPasswordForm.php:

  public function buildForm(array $form, FormStateInterface $form_state) {
    ...
  
    $form['#cache']['contexts'][] = 'url.query_args';

    return $form;
  }

modules/views/views.theme.inc:

function template_preprocess_views_mini_pager(&$variables) {
  ...

  // This is based on the entire current query string. We need to ensure
  // cacheability is affected accordingly.
  $variables['#cache']['contexts'][] = 'url.query_args';

modules/views/src/Plugin/views/style/Table.php:

  public function getCacheContexts() {
    $contexts = [];

    foreach ($this->options['info'] as $field_id => $info) {
      if (!empty($info['sortable'])) {
        // The rendered link needs to play well with any other query parameter
        // used on the page, like pager and exposed filter.
        $contexts[] = 'url.query_args';
        break;
      }
    }

    return $contexts;
  }
🐛 Bug report
Status

Active

Version

11.0 🔥

Component
Cache 

Last updated about 23 hours ago

Created by

🇧🇪Belgium weseze

Live updates comments and jobs are added and updated live.
  • Security

    It is used for security vulnerabilities which do not need a security advisory. For example, security issues in projects which do not have security advisory coverage, or forward-porting a change already disclosed in a security advisory. See Drupal’s security advisory policy for details. Be careful publicly disclosing security vulnerabilities! Use the “Report a security vulnerability” link in the project page’s sidebar. See how to report a security issue for details.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇩🇰Denmark ressa Copenhagen

    A workaround that "fixed" the issue for me, is disabling the cache for views with large amounts of data.

    I tried this (I am using Facets module, which requires this) but the View is still getting cached, and the cache_page table is getting views entries, which it should not.

    The only method to stop the view from getting cached is to disable cache for everything under Performance (/admin/config/development/performance), setting "Caching | Browser and proxy cache maximum age" to <no caching> which is less than ideal ...

    I tried searching for drupal views "Caching:None" ignored but found no hints.

Production build 0.69.0 2024