Module allows catch-all cacheing of responses based off of initial Accept-Content header

Created on 28 October 2019, about 5 years ago
Updated 25 October 2023, about 1 year ago

Before gzipping a response, this module looks to $_SERVER['HTTP_ACCEPT_ENCODING'], to check if the requesting client actually supports gzip according to the Accept-Encoding header. This is good, makes sure the few clients out there that don't support gzip won't be left out in the cold.

However, this logic does not consider the caching mechanisms of Drupal itself and of CDNs Drupal may be placed behind. If a response altered by gzip_html is cacheable in the page cache, currently, that cache entry will be used for all subsequent requests regardless of later values of the Accept-Encoding header.

That means the following scenario can take place, at the level of Drupal or at the level of a CDN: The first client to request a not-yet-cached page supports gzip. The response is cached, in the gzipped form that this module alters it to. A later client that does not support gzip requests the same page. That later client gets a gzipped response anyway.

The reverse holds true too: if the client that triggers generation of the page cache entry does not support gzip, everyone who gets the cached page afterward will get the cached un-gzipped version. Not as critical, but the performance/bandwidth that could have been gained is lost.

πŸ› Bug report
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States bvoynick

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    > BUT I believe this code introduces something of a Denial of Service vulnerability and should not be committed as-is.

    Drive by review/comment:

    * You don't need anything as fancy as this. Regular page cache always caches on the full query string, any unique combination of query parameters will always at least bypass internal/anonymous page cache. Dynamic page cache will only vary by certain or all query parameters if instructed to do so via cache contexts, but even then, all it needs is a page with a pager, and you can count to infinity with ?page=99999 or whatever.

    * Cache context actually don't work on the internal/anonymous page cache as you found out. To make it work with page cache, you would need to either customize \Drupal\page_cache\StackMiddleware\PageCache::getCacheId (with a subclass and service provider alter) or run this as as a middleware after/before page cache, so that page cache always caches the non-compressed version and then you either compress or not. Of course that would mean that you'd then need to do that on every cache hit. A third option would be to have a response policy that disables caching on requests that don't support gzip, how many there are depends on your site.

Production build 0.71.5 2024