PageCache caching uncacheable responses (violating HTTP/1.0 spec) + D8 intentionally disabling HTTP/1.0 proxies = WTF

Created on 13 December 2016, over 8 years ago
Updated 19 January 2023, about 2 years ago

Problem/Motivation

PageCache determines which responses to cache by inspecting the Expires header. (This should be updated to use the Cache-Control header instead, but that's out of scope here.)

Responses that have Expires: Sun, 19 Nov 1978 05:00:00 GMT are meant to not be cached, because it's an expiration date in the past. Yet PageCache caches them permanently:

$expire = ($date > $request_time) ? $date : Cache::PERMANENT;

This has been the case for a very long time, even predating #2527126: Only send X-Drupal-Cache-Tags and -Contexts headers when developer explicitly enables them . Quite possibly the rationale was .

Now, to make matters worse, even when you configure a non-zero "max age" at /admin/config/development/performance, \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setResponseCacheable() will still cause that same Expires header to be set? Why? Because when \Drupal\Core\EventSubscriber\FinishResponseSubscriber::onRespond() calls \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setResponseCacheable(), the latter contains this code:

    // HTTP/1.0 proxies do not support the Vary header, so prevent any caching
    // by sending an Expires date in the past. HTTP/1.1 clients ignore the
    // Expires header if a Cache-Control: max-age directive is specified (see
    // RFC 2616, section 14.9.3).
    if (!$response->headers->has('Expires')) {
      $this->setExpiresNoCache($response);
    }

In other words:

  1. Drupal 8 intentionally disables HTTP/1.0 proxies
  2. Drupal 8 ships with a HTTP/1.0 reverse proxy that is breaking the spec: Page Cache

Proposed resolution

Make \Drupal\page_cache\StackMiddleware\PageCache a HTTP/1.1 proxy: make it inspect Cache-Control rather than Expires

Remaining tasks

TBD

Remaining tasks

TBD

User interface changes

None.

API changes

None.

Data model changes

None.

🐛 Bug report
Status

Needs review

Version

10.1

Component
Page cache 

Last updated about 2 months ago

Created by

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Live updates comments and jobs are added and updated live.
  • DrupalWTF

    Worse Than Failure. Approximates the unpleasant remark made by Drupal developers when they first encounter a particular (mis)feature.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Status changed to Needs work about 2 years ago
  • 🇺🇸United States smustgrave

    #13 had many failures and was not ready for review.
    Which show this will need it's own test case and work for BC.

  • First commit to issue fork.
  • Pipeline finished with Failed
    about 1 year ago
    Total: 625s
    #136089
  • Pipeline finished with Failed
    about 1 year ago
    Total: 570s
    #137178
  • First commit to issue fork.
  • 🇪🇸Spain vidorado Logroño (La Rioja)

    vidorado changed the visibility of the branch 11.x to hidden.

  • Pipeline finished with Failed
    about 2 months ago
    Total: 580s
    #420211
  • Pipeline finished with Success
    about 2 months ago
    Total: 932s
    #420265
  • 🇪🇸Spain vidorado Logroño (La Rioja)

    This was a tough one, but I believe I've managed to sort things out.

    Some unexpected behaviors were occurring, such as the FinishResponseSubscriber setting an Expires header, which made the response non-cacheable—inside the setResponseCacheable() method, which should theoretically do the opposite.

    Additionally, some functional tests in PageCacheTest were incorrectly expecting the response to be non-cacheable when it actually should have been. As a result, they were effectively masking the issue with cacheability headers.

    A more in-deep review is necessary, as this is a complex scenario with multiple edge cases involved, and a thorough understanding of the HTTP cacheability headers specification is required to do so.

    Also, I haven't been able to mark the MR as "Ready" I'm not sure if someone with higher privileges could do it, so I don't have to create a new one.

  • 🇳🇴Norway steinmb

    Tagging with Performance. I think @catch have the habit of checking that tag, might have some valuable feedback on the MR. I have none.

  • 🇳🇱Netherlands johnv

    I tested the patch, hoping it would solve my "Anonymous never get fresh pages" problem for a contrib field module.
    Reading the summary closely, it will not, since my use case does not set the Expire, only max-age (there is no guidance how to apply setExpire() by contrib either.)
    Anyway, the Summary lacks a test script. I added one. Please correct it if necessary.

  • 🇪🇸Spain vidorado Logroño (La Rioja)

    After digging into the code, I realized I made a mistake. To prevent multiple tests from failing, I mistakenly kept the very piece of code that was wrong in the first place:

      $expire = ($date > $request_time) ? $date : Cache::PERMANENT;
    

    I also confirmed that this issue is only related to the Expires header and not to PageCache’s behavior of ignoring max-age. For that second issue, please refer to 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed @johnv.

  • Pipeline finished with Failed
    about 2 months ago
    Total: 440s
    #424140
  • Pipeline finished with Success
    about 2 months ago
    Total: 2136s
    #424256
  • 🇪🇸Spain vidorado Logroño (La Rioja)

    I've finally fixed the caching behavior and all tests.

    Things I'm concerned about:

    • I had to remove the weak (W/) prefix from the ETag header, as it was sometimes being added, causing multiple EntityResourceTestBase tests to fail. I was able to reproduce the issue, but I'm not entirely sure if it's due to an Nginx configuration or something else.
    • Now that system.performance.cache.page.max_age is being properly honored, I had to set a max_age value in several tests expecting responses to be cacheable. I'm not sure if this is the correct approach or if we should configure it globally in BrowserTestBase.
    • I've assumed that having two Vary headers in the response is correct, but I haven't determined why they are being set separately. An additional Accept-Encoding value is added in a second Vary header instead of being merged into the existing Vary header containing Cookie and Origin values. To account for this, I modified CorsIntegrationTest to check for "Origin" across all possible Vary headers.
  • 🇪🇸Spain vidorado Logroño (La Rioja)

    By the way, now that this fix ensures the system.performance.cache.page.max_age config value is properly honored by PageCache, I'm not sure how it will overlap with 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed , but it's certainly related.

  • 🇨🇭Switzerland znerol

    Sorry for the late answer, commenting on the issue summary and the two points raised by @Wim Leers:

    1. Drupal 8 intentionally disables HTTP/1.0 proxies

    This has been the case since Drupal 7 (and also Pressflow - Drupal 6 optimized for high performance content oriented websites). It has been necessary for any web application which uses cookies to disable HTTP/1.0 caches. There is no way in HTTP/1.0 to maintain different page variations under the same URL. Everything which influences the result (like desired language, session and login state) has to be encoded in the URL. Yes, session identifiers were passed via request parameters back then.

    When HTTP/1.1 introduced cookies and cache control, it was specified that the Cache-Control: max-age directive takes precedence over the HTTP/1.0 Expires header.

    "Modern" web applications which serve multiple variants of a page under the same URL (depending on the application state maintained in the cookie header) can be made compatible with both HTTP protocol versions by serving a suitable Cache-Control and Expires header. The way to do this is to serve an appropriate Cache-Control header to permit storage in HTTP/1.1 caches (if appropriate) and serve an Expires header which is set into the past in order to disable caching in HTTP/1.0 caches. Remember, HTTP/1.0 caches assume that the only thing which can influence the response is the URL.

    This pattern is not specific to Drupal. Everybody does this.

    Drupal 8 ships with a HTTP/1.0 reverse proxy that is breaking the spec: Page Cache

    Responses that have Expires: Sun, 19 Nov 1978 05:00:00 GMT are meant to not be cached, because it's an expiration date in the past. Yet PageCache caches them permanently:

    The page cache module is not a HTTP/1.0 reverse proxy. It is neither a HTTP/1.1 reverse proxy. The caching logic has its roots in Pressflow and is tailored to content oriented websites.

    In unmodified HTTP/1.1+ reverse proxies and browser caches, the only way to get rid of cached pages is to limit their validity to a certain time frame (i.e., max-age or s-max-age). The system.performance.cache.page.max_age config value is the way to specify that value for those downstream caches which are otherwise out-of-reach for Drupal.

    With the built-in cache the situation is different. It is possible to remove cached pages whenever a change is made on the website. Stock Drupal 7 did nuke the whole page cache whenever any content was changed. Some contrib modules rectified that heavy handed approach and made it possible to define rules which pages were purged depending on which content was changed.

    D8+ introduced cache tags and that removed all the guess work. No site builder has to think about which pages to purge from the cache nowadays.

    For this reason the page cache module keeps rendered pages in the cache for an unspecified amount of time. This has been the way it worked since Pressflow.

    The following remark in the issue summary is indeed pointing to a problem point in the page caching module:

    PageCache determines which responses to cache by inspecting the Expires header. (This should be updated to use the Cache-Control header instead, but that's out of scope here.)

    The page cache module really shouldn't inspect the Expires header. That logic needs to be deprecated indeed.

    Updating issue status to Postponed (maintainer needs more info). I will wait a couple of weeks and then set it to Closed (works as designed) unless there is significant new (and valid) information added to the issue summary.

  • 🇪🇸Spain vidorado Logroño (La Rioja)

    Thanks @znerol for a detailed response. It makes me feel better after have worked in vain, due to misunderstanding the Page Cache role in the whole picture.

    I think that the only change this issue fixes, then, is setting an Expires header which honors system_performance.cache.page.max_age, which would be possibly beneficial for HTTP 1.0 proxies down the line of a response.

    Before:

     protected function setResponseCacheable(Response $response, Request $request) {
        // HTTP/1.0 proxies do not support the Vary header, so prevent any caching
        // by sending an Expires date in the past. HTTP/1.1 clients ignore the
        // Expires header if a Cache-Control: max-age directive is specified (see
        // RFC 2616, section 14.9.3).
        if (!$response->headers->has('Expires')) {
          $this->setExpiresNoCache($response);
        }
    
        $max_age = $this->config->get('cache.page.max_age');
        $response->headers->set('Cache-Control', 'public, max-age=' . $max_age);
    

    After:

     protected function setResponseCacheable(Response $response, Request $request) {
        $max_age = $this->config->get('cache.page.max_age');
        $response->headers->set('Cache-Control', 'public, max-age=' . $max_age);
    
        // If the response does not already have an Expires header, set one.
        if (is_int($max_age) && !$response->headers->has('Expires')) {
          $response->setExpires((new \DateTime())->setTimestamp($this->time->getRequestTime() + $max_age));
        }
    

    Which I believe is not in the scope of the original issue and could fit better as an additional task to 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed .

    But, as you said, a problem remains unsolved:

    PageCache determines which responses to cache by inspecting the Expires header. (This should be updated to use the Cache-Control header instead, but that's out of scope here.)

    So, what's next? I'm not sure at all...

    1. Bring the discussion of max_age config parameter honoring to 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed
    2. Close this issue as "Works as designed" or leave only the fix that fixes the Expires header and go on to Fixed? I'm not sure if it's beneficial for HTTP 1.0 proxies or if they really exist yet nowadays.
    3. Create another issue to fix the mentioned remaining problem with PageCache inspectiong Expires header instead of Cache-Control, so this one don't get "noisy"?

    Am I right? :)

  • 🇨🇭Switzerland znerol

    I think that the only change this issue fixes, then, is setting an Expires header which honors system_performance.cache.page.max_age, which would be possibly beneficial for HTTP 1.0 proxies down the line of a response.

    No, that wouldn't be desirable. If Drupal did this, then that would lead to potential information leak via HTTP/1.0 caches. This is because HTTP/1.0 caches would store/serve every page, regardless of whether there were cookies on the request or not. HTTP/1.0 caches are unable to differentiate between authenticated requests and anonymous ones simply because they do not know anything about cookies.

    Bring the discussion of max_age config parameter honoring to 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed

    Depends. If you mean that no page should be cached in the page cache longer than system_performance.cache.page.max_age, then no. If you mean that page cache should honor the max-age directive collected from render metadata, then yes, that is the correct issue.

    Create another issue to fix the mentioned remaining problem with PageCache inspectiong Expires header instead of Cache-Control, so this one don't get "noisy"?

    No need for that, this is covered by 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed .

  • 🇪🇸Spain vidorado Logroño (La Rioja)

    Oh, my bad—I mixed things up again...

    Now it’s clear to me: HTTP 1.0 proxies must always be skipped using a past Expires header. So, there’s nothing to fix. Thanks for rephrasing the explanation—it helped me finally grasp the full picture.

    Since the third point is covered by 🐛 [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed , there's nothing else to address. I agree on closing this as "Works as designed" and leaving it as is. 🙂

  • 🇺🇸United States smustgrave

    Thanks all!

Production build 0.71.5 2024