We noticed on several of our sites that the page_cache table was steadily growing to hundreds of megabytes, containing thousands of rows. We didn't understand why as these sites were not very content heavy and should be very easily cacheable. The example site we used for debugging contains 589 pages and no authenticated users. So we expected to see 589 page_cache entries. Perhaps a few more caused by views pagers, exposed filters and so on. But certainly no more then 1.000.
Problem 1:
After some investigation we noticed a new page_cache entry was created for every unique URL, regardless wether or not that URL actually creates a different output... This is caused by adding query parameters to the URL.
As an example I created a simple bash script that requests url's with a variable id: http://example.tld/?id=x and loops over x from 1 to 10.000. Sure enough the page_cache generated 10.000 cache entries, resulting in 1.5GB of mysql data in the page_cache table.
This should not happen as the cache contexts for the response on this page does not contain the "url.query_args" context. So the cache system should know that the "id" query parameter does not result in any change and should not cause more then 1 entry in the cache_page.
Is this normal behaviour? I could find some related issues, but no clear description as to why this happens, nor a solution:
💬
Dynamic cache does not respect query parameters
Closed: works as designed
#2062463: allow page cache cid to be alterable →
#2662196: Cache route by Uri and not just Query+Path →
I think this could potentially be exploited to overload websites or even crash mysql database or other caching software?
Problem 2:
If we would fix problem 1, a second problem still pops up. Because in our example site we have some modules active, some configuration, some blocks, some real content, ... Mainly a search block is placed in the header of our site on all pages. That in turn creates a form on our website and somehow that seems to add "url.query_args" to the cache context for all pages. So even if the cache id for the page_cache would filter on only query args passed in the cache context, this behaviour would still add all query parameters causing every query parameter to generate a new page_cache entry.
So we did some searching and found that Drupal core adds "url.query_args" in a few places where it should not be needed at all... See the listing below of what I could find.
Why is this implemented like this?
Probably lots of contrib modules already "rely" on this "bazooka"-behaviour...?
core/lib/Drupal/Core/Form/FormBuilder.php:
/**
* #lazy_builder callback; renders a form action URL.
*
* @return array
* A renderable array representing the form action.
*/
public function renderPlaceholderFormAction() {
return [
'#type' => 'markup',
'#markup' => $this->buildFormAction(),
'#cache' => ['contexts' => ['url.path', 'url.query_args']],
];
}
modules/user/src/Form/UserPasswordForm.php:
public function buildForm(array $form, FormStateInterface $form_state) {
...
$form['#cache']['contexts'][] = 'url.query_args';
return $form;
}
modules/views/views.theme.inc:
function template_preprocess_views_mini_pager(&$variables) {
...
// This is based on the entire current query string. We need to ensure
// cacheability is affected accordingly.
$variables['#cache']['contexts'][] = 'url.query_args';
modules/views/src/Plugin/views/style/Table.php:
public function getCacheContexts() {
$contexts = [];
foreach ($this->options['info'] as $field_id => $info) {
if (!empty($info['sortable'])) {
// The rendered link needs to play well with any other query parameter
// used on the page, like pager and exposed filter.
$contexts[] = 'url.query_args';
break;
}
}
return $contexts;
}