Browser language detection is not cache aware

Created on 20 February 2015, about 10 years ago
Updated 8 February 2024, about 1 year ago

Problem/Motivation

#2368987: Move internal page caching to a module to avoid relying on config get on runtime β†’ moved the page caching to its own module and removed the caching setting instead (so caching is dependent on the module being enabled or not). Before that, the browser detection only worked if page caching was turned off. That setting was removed, so browser language negotiation now runs even with page caching enabled, but is not cached. It uses the kill switch instead:

\Drupal::service('page_cache_kill_switch')->trigger();

To make it cacheable, we need to overcome a few problems.

A. If browser language negotiation is before URL language negotiation, you may get different language negotiated for the same path. We should ensure browser negotiation is after URL negotiation.
B. Once we ensure that browser negotiation is after URL language detection, we need to ensure that browser negotiation redirects to the correct language specific URL.
C. You may use browser negotiation without URL language detection. We need to somehow ensure that you only use browser language negotiation if we can redirect to a concrete URL that is explicit for the negotiated language (contrib negotiation methods may be involved, session negotiation may be fed with an URL parameter to set this, etc).
D. Even if you use browser negotiation and URL negotiation, if one of the languages has an empty path prefix, it will fall through URL detection, so may still have different languages or is not redirectable to. Needs to be ensured that if browser language negotiation and URL negotiation is used, all languages have a prefix.

Proposed resolution

Problem B and C (implemented in #65)

Implement language redirect:
- Implement a new method onKernelRequestLanguageRedirect() on the LanguageRequestSubscriber.
- This method is called after current language is detected and routes are initialized.
- The method first ensures that redirect is allowed: request method is GET, there is no "destination" parameter in the current request, etc.
- The method constructs URL to the current route. The URL is processed with URL outbound processors, so it may contain language path prefix or other language identifier.
- The method compares the result URL with the requested URL. If the differs, it assumes that the language identifier was added and redirects to the result URL.

Here is a behavior example for the case when a) URL negotiation is enabled and language prefixes are set up properly, b) browser negotiation is enabled and goes after the URL negotiation.

Before patch:
- "domain.com/" URL accessed
- "domain.com/" page cache is killed
- all links rendered on the page leads to language prefixed URLs

After patch:
- "domain.com/" URL accessed
- "domain.com/" page cache is killed
- user is redirected to language prefixed URL ("domain.com/en") which can be cached

So, the patch does not avoid cache kill, instead it introduces language redirect. (Which in D7 was usually performed by globalredirect module.)

Problem A and D

Implement hook_requirements() that warns user about the possible cache issues
- if there is more that 1 language and the path prefix is not set for some language
- if browser negotiation is the only enabled method
- if URL negotiation is set after the browser negotiation

Alternatively, the language_negotiation_url_prefixes_update() function can be updated to always add language path prefix to all languages. But this requires some tests update, see #66 β†’ for details.

Alternative resolution

(implemented in #116)

- Introduce new response policy `Vary`, that will be less disruptive than `KillSwitch`.
- Modify PageCache to respect Vary headers coming from response.

Latter is a good thing in general, going beyond the scope of language detection.
Currently PageCache distinguishes cached pages only by predefined "tags": host, URL, and requested format. The only supported Vary header is cookie, and it's handled in ad-hoc manner.

Since page cache occurs on the early stage of request processing, the fastest/easiest way to get the list of vary headers is to fetch them from "generally" cached response object, and then check if there are any headers in request object that match vary criteria.

Regarding redirect and related problems A-D, I believe they should be handled in ✨ Route normalizer: Global Redirect in core Needs work , and this issue should take care of cache awareness of browser language detection.

Remaining tasks

+ Fix tests
+ Write specific tests
- Review
- Maybe address Problem A and D.

User interface changes

Possibly errors for unsupported configurations.

API changes

Not likely. Additions needed.

πŸ› Bug report
Status

Needs work

Version

11.0 πŸ”₯

Component
Language systemΒ  β†’

Last updated 1 day ago

  • Maintained by
  • πŸ‡©πŸ‡ͺGermany @sun
Created by

πŸ‡­πŸ‡ΊHungary GΓ‘bor Hojtsy Hungary

Live updates comments and jobs are added and updated live.
  • D8MI

    (Drupal 8 Multilingual Initiative) is the tag used by the multilingual initiative to mark core issues (and some contributed module issues). For versions other than Drupal 8, use the i18n (Internationalization) tag on issues which involve or affect multilingual / multinational support. That is preferred over Translation.

  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¨πŸ‡­Switzerland handkerchief

    Any news on that?

  • πŸ‡ΊπŸ‡¦Ukraine Vadym.Tseiko

    Hi everyone, I have a similar problem, my project is configured via detection and selection URL by domains method and I need to switch language to the correct domain accordingly to the user's browser preferences. But in my case, when using detection and selection via browser I'm constantly switched to my preferred language without the ability to switch language only once(what I'm achieved with custom event subscriber and redirect). I added a custom code that redirects user but now I'm having a problem with SEO as Google search bots is redirected too, and all machine services return 302 status code besides the basic English one as those bots I guess have English as a preference. Is there a way for redirect to not mess with bots or to detect them and not redirect if that not human-users, none of those patches fixed behavior when the user is constantly redirected and none of them set the right domain in the URL as a result I getting always URL from with browser switch to the preferred language? Has anyone solved this problem somehow?

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update over 1 year ago
    29,721 pass, 2 fail
  • First commit to issue fork.
  • Pipeline finished with Failed
    about 1 year ago
    Total: 259s
    #90636
  • πŸ‡§πŸ‡ͺBelgium matthijs

    Since the cache ID is actually cached, the $vary_headers passed to \Drupal\page_cache\StackMiddleware\PageCache::getCacheId() actually did nothing.

    I created a merge request from #2430335-194: Browser language detection is not cache aware β†’ and did some changes to cache the cache ID by $vary_headers.

  • Pipeline finished with Failed
    about 1 year ago
    #90786
  • πŸ‡©πŸ‡ͺGermany sleitner

    I fixed the visibility of the modules variable of the test. Drupal\Tests\language\Functional\LanguageBrowserDetectionAcceptLanguageTest still fails.

  • Pipeline finished with Canceled
    about 1 year ago
    Total: 181s
    #91687
  • Pipeline finished with Success
    about 1 year ago
    Total: 569s
    #91689
  • Pipeline finished with Success
    about 1 year ago
    Total: 493s
    #95242
  • Status changed to Needs review about 1 year ago
  • πŸ‡©πŸ‡ͺGermany sleitner

    X-Drupal-Cache is now always present. It is MISS for the first access and HIT for further accesses.

  • Status changed to Needs work about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Left some comments on MR

    I don't know if adding a CR from D9 is correct anymore but will leave as is.

  • Pipeline finished with Canceled
    about 1 year ago
    #97607
  • Pipeline finished with Canceled
    about 1 year ago
    Total: 28s
    #97608
  • Pipeline finished with Canceled
    about 1 year ago
    Total: 7s
    #97609
  • Pipeline finished with Canceled
    about 1 year ago
    #97610
  • Pipeline finished with Success
    about 1 year ago
    Total: 574s
    #97611
  • Pipeline finished with Failed
    about 1 year ago
    #97623
  • Pipeline finished with Success
    about 1 year ago
    Total: 560s
    #97626
  • Status changed to Needs review about 1 year ago
  • πŸ‡«πŸ‡·France andypost

    added 2 questions to MR

  • Pipeline finished with Canceled
    about 1 year ago
    Total: 73s
    #97643
  • Pipeline finished with Success
    about 1 year ago
    Total: 486s
    #97644
  • πŸ‡©πŸ‡ͺGermany sleitner

    I don't know why this is not a local variable. But since it only reads the request, a clone is not necessary.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Believe all feedback has been addressed around MR but don't really follow the issue summary, specifically the proposed solution section. Will leave in review if anyone else wants to pick up.

  • Status changed to Needs work about 1 year ago
  • πŸ‡¨πŸ‡­Switzerland znerol

    This includes #3023104. As one of the authors of the cache policy mechanism, I'd be disappointed if this approach would make it into core. Citing from #3023104-4: Introduce "Vary" page cache response policy β†’ :

    Regrettably this approach clearly violates the ResponsePolicyInterface contract. While not stated explicitly in the documentation, the check() method is certainly not expected to actually change the response.

  • πŸ‡¨πŸ‡­Switzerland znerol

    I suggest to move the logic into a response event subscriber instead.

  • πŸ‡¨πŸ‡­Switzerland cola

    cola β†’ changed the visibility of the branch 2430335-browser-language-detection to hidden.

  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    bbrala β†’ changed the visibility of the branch 2430335-browser-language-detection to active.

  • Merge request !8878Implementation using a subscriber β†’ (Open) created by bbrala
  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    I played around with this and moved the implementation to a eventsubscriber as asked for by @znerol. I see one things that is very weird though as per comment in the code:

        /**
         *  Either Drupal\Tests\page_cache\Functional\PageCacheVaryTest::testPageCacheWithVary
         *  fails or Drupal\Tests\language\Functional\LanguageBrowserDetectionAcceptLanguageTest::testAcceptLanguageEmptyDefault
         *  fails if you remove one of those. Not sure why
         */
        $events[KernelEvents::RESPONSE][] = ['onRespond', -100];
        $events[KernelEvents::RESPONSE][] = ['onRespond', 100];
        return $events;
    

    Not sure why, those priorities sometimes confuse me a little.

  • Pipeline finished with Failed
    8 months ago
    Total: 161s
    #231249
  • Pipeline finished with Failed
    8 months ago
    Total: 593s
    #231263
  • Pipeline finished with Success
    8 months ago
    Total: 903s
    #231273
  • Status changed to Needs review 8 months ago
  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    Well at least everything is green. So thats good. But it some weirdness with the event subscriber. Not sure how that should work. I didn't investigate the redirects and such. Mostly just refactored the current recent patch to work with an event subscriber.

  • Pipeline finished with Failed
    8 months ago
    Total: 460s
    #231296
  • Status changed to Needs work 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Left some comments in the MR.

  • Pipeline finished with Failed
    5 months ago
    Total: 183s
    #320556
  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    I recently discussed thie with catch.

    Using the header and vary on that is very dangerous for cache purposes. It will make very very many variations, which results in bad caching and performance. Not to mention possible memroy issues in caches or high cache trashing.

    We did think about how we could possible work around that.

    What if;

    We normalize the prermutation of the language into the resulting language, and cache that normalization and the resulting language.

    So perhaps a site with en_EN and fr_FR. A browser comes along and tells the site "Hi i'm en-GB,en-US;q=0.9,en;q=0.8". The site would find out what this will end up as so

    en-GB,en-US;q=0.9,en;q=0.8 magic => en_EN, we cache this and threat serve cache asif en_EN. This would mean (at least on the drupal side) only the records for inLang => outLang exist, which are pretty small and we dont end up with loads of difference permutations.

    Not entirely sure how we would implement that, but feels like a way to have drupal cache be happy.

    The little devil on my shoulder also has a voice though. Which does tell me this is kinda a. can of worms as this could also easily break caching on the caches in front of Drupal, be it varnisch or maybe smoething like cloudflare. That is kinda the scary part. Perhaps contrib is better to have this live, instead of trying to get this into core.

  • πŸ‡§πŸ‡ͺBelgium borisson_ Mechelen, πŸ‡§πŸ‡ͺ

    Using the header and vary on that is very dangerous for cache purposes. It will make very very many variations, which results in bad caching and performance. Not to mention possible memroy issues in caches or high cache trashing.

    While this is true for a web-based environment, it is not true for JSON:API.
    Quoting @GabeSullice from https://www.drupal.org/project/drupal/issues/2794431#comment-12806367 🌱 [META] Formalize translations support Needs work

    However, JSON API doesn't need to be very concerned about that because JSON API requests are not typically made directly via a browser. Instead, they're most frequently made via JavaScript or some other system where the Accept-Language header can be tightly controlled by a developer.

  • πŸ‡¨πŸ‡­Switzerland znerol

    Re #21: Fair.

    Maybe we need a way to create two (or more) groups of routes. One group is accessed by browsers, others by special clients. Then apply different sets of rules for request processing, content/language negotiation, authentication and caching to each of them.

    However, page cache is only effective if it runs before request routing. So maybe the matching needs to happen on a path prefix.

Production build 0.71.5 2024