Normalize the incoming path with urldecode directly in RouteProvider

Created on 21 June 2024, 3 months ago
Updated 10 September 2024, 18 days ago

Problem/Motivation

After updating Drupal from version 10.2.7 to version 10.30, a 404 error occurred!

Bilingual website pl/en

the https//mydomain/en version works fine
the https//my_domain version (by default pl as the domain's primary language) shows a 404 error after clearing the cache.

Warning: Undefined array key "route:[language]=pl:[query_parameters]=:/ " in Drupal\Core\Cache\DatabaseBackend->getMultiple() (line 155 of core/lib/Drupal/Core/Cache/DatabaseBackend.php).
Drupal\Core\Cache\DatabaseBackend->getMultiple(Array, ) (Line: 126)
Drupal\Core\Cache\DatabaseBackend->get('route:[language]=pl:[query_parameters]=:/') (Line: 169)
Drupal\Core\Routing\RouteProvider->getRouteCollectionForRequest(Object) (Line: 252)
Drupal\Core\Routing\Router->getInitialRouteCollection(Object) (Line: 142)
Drupal\Core\Routing\Router->matchRequest(Object) (Line: 90)
Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object) (Line: 105)
Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object, 'kernel.request', Object)
call_user_func(Array, Object, 'kernel.request', Object) (Line: 111)
Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch(Object, 'kernel.request') (Line: 157)
Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object, 1) (Line: 76)
Symfony\Component\HttpKernel\HttpKernel->handle(Object, 1, 1) (Line: 53)
Drupal\Core\StackMiddleware\Session->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object, 1, 1) (Line: 28)
Drupal\Core\StackMiddleware\ContentLength->handle(Object, 1, 1) (Line: 32)
Drupal\big_pipe\StackMiddleware\ContentLength->handle(Object, 1, 1) (Line: 106)
Drupal\page_cache\StackMiddleware\PageCache->pass(Object, 1, 1) (Line: 85)
Drupal\page_cache\StackMiddleware\PageCache->handle(Object, 1, 1) (Line: 50)
Drupal\ban\BanMiddleware->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object, 1, 1) (Line: 36)
Drupal\Core\StackMiddleware\AjaxPageState->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\StackedHttpKernel->handle(Object, 1, 1) (Line: 741)
Drupal\Core\DrupalKernel->handle(Object) (Line: 19)

P.S.
There is no cache problem with a single-language website.

I think the problem is that garbage collection corrupts the path to the home-page
does not allow null values ​​in the main language.
After entering the language prefix it works fine.
But I guess that's not the point, because it destroys the links already provided.

Steps to reproduce

Install the standard profile.

TRUNCATE cache_data;

Visit https://drupal-dev.ddev.site/

Then https://drupal-dev.ddev.site/:%20

Without the MR, you will get a 404 and see the following three cache entries:

MariaDB [db]> SELECT cid FROM cache_data WHERE cid LIKE 'route:%';
+-----------------------------------------------------+
| cid                                                 |
+-----------------------------------------------------+
| route:[language]=en:[query_parameters]=:/           |
| route:[language]=en:[query_parameters]=:/%20        |
| route:[language]=en:[query_parameters]=:/system/404 |
+-----------------------------------------------------+

With the MR you will get a 200 on the path with %20 at the end, and there'll be a single cache entry:

MariaDB [db]> SELECT cid FROM cache_data WHERE cid LIKE 'route:%';
+-------------------------------------------+
| cid                                       |
+-------------------------------------------+
| route:[language]=en:[query_parameters]=:/ |
+-------------------------------------------+
1 row in set (0.001 sec)

Proposed resolution

While there are not steps to reproduce yet, route lookup caching is inconsistent for urlencoded URLs, due to the way PathProcessorDecode happens during routing after the cache ID is created.

Instead of a path processor, we need to normalize the incoming path with urldecode directly in RouteProvider.

This means that https://example.com/%20 will consistently return a 200 (for the front page) instead of a 404. We might want a follow-up to redirect trailing urlencoded spaces in .htaccess

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

πŸ› Bug report
Status

Needs review

Version

11.0 πŸ”₯

Component
CacheΒ  β†’

Last updated about 19 hours ago

Created by

πŸ‡΅πŸ‡±Poland pingwin_cracow

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @pingwin_cracow
  • πŸ‡ΊπŸ‡ΈUnited States rbomhof New York, NY

    We just updated to 10.3 and sporadically (so far once a day) the english version of our main homepage serves a 404 until we resave it, very frustrating. Did you find any better way to address this?

  • Status changed to Postponed: needs info 3 months ago
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    I assume this only happens on multilingual sites but are there concrete steps to reproduce? e.g. if you have a multilingual site and clear cache then visit the homepage, does it happen then? If not, can you work out the sequence of events that cause it?

    Downgrading to "major" as this is not a site-down event and it appears from reports the site mostly works, it is only the homepage that is affected and even then not all the time.

  • πŸ‡ΊπŸ‡ΈUnited States rbomhof New York, NY

    Unfortunately we have yet to be able to reproduce this by taking any manual steps, and didn't notice anything in lower environments. So far once per day or so our multilingual site is marked down, and the default english version of the homepage serves a 404 until we resave that node, at which point it pops back in and loads as normal. The error we see in the logs is similar to what is mentioned above:

    Warning: Undefined array key "route:[language]=en:[query_parameters]=:/ " in Drupal\Core\Cache\DatabaseBackend->getMultiple() (line 155 of /code/web/core/lib/Drupal/Core/Cache/DatabaseBackend.php)

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    I encountered this issue today as well. I also have a bilingual website with English as the primary language. The home page would go to a 404 page - "The requested page could not be found". It resolved after I cleared the cache. I am on Drupal version 10.3.1.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    We are encountering this problem as well. We have a client with a very high-profile humanitarian aid site that is being affected by this. I think this is critical. For our client, the homepage being down is as bad as the entire site being down. And the homepage will only answer up again if a human intervenes (in the middle of the night).

    This site is hosted on Pantheon - curious if the other reported incidents are also on Pantheon?

    I plan to open a support ticket with them. I will report back anything I learn.

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    @banoodle Our site is on AWS Linux, not Pantheon.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    Thanks @fkildoo. Interesting.

    We are now looking into adding the $settings['state_cache'] = TRUE; to settings.php mentioned in this change order β†’ .

    Not sure this is our problem, but we are hoping it might help.

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    It should not be required to enable state caching - that is an opt-in feature added in Drupal 10.3 and enabled by default in Drupal 11. However if this does make a noticeable difference here, that is an interesting data point that we should look into further.

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    Enabling state caching didn't make a difference for me. I just received the same error in the log I received earlier. However, my home page is still ok.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    @fkildoo after running the suggested queries in the change record β†’ , we have concluded we might benefit from setting $settings['state_cache'] to FALSE, so we are giving that a try. Our incidents of this problem are infrequent (2 in 5 days), so it will be difficult to assess the success or failure of this setting.

    One bandaid we may try for now is to set up a ping to hit https://oursite.com/en because @pingwin_cracow mentioned that working to resolve their issue once it occurs. At least if it does happen again, we're thinking the ping might resolve the situation quickly. We are considering pinging the site every 3 minutes.

    In our case, saving the homepage node resolved the issue.

  • πŸ‡¬πŸ‡§United Kingdom catch

    The same undefined array key warning shows up in a similar, old, bug report for redirect module #3257559: Redirection issue in multilingual setup β†’ , suggests we might be hitting similar conditions.

    If the state caching situation is not a red herring, then this might be related to the changes we made to Drupal\Core\Routing\RoutePreloader in 10.3.0 which removed the separate cache item in front of the state API.

    For people experiencing this:

    1. Do you have redirect module enabled
    2.Please report back if state caching makes a difference, and also if it doesn't, once you've had a chance to try it out.

    Also it sounds from reports like this might be an intermittent issue - e.g. clearing cache temporarily fixes it for a day or so, then it re-occurs?

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    1. Yes, I have the redirect module enabled

    2. I have state caching enabled but I don't think it makes any difference. I received the error and the 404 on my home page before I enabled it. After I enabled it, I received the error once again but not a 404 error on the home page.

    Today was the first I became aware of the issue. I checked my logs back to 7/6 and there were no other errors.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    Yes, we are using the Redirect module.

    Today, we set state cache to FALSE (because we had a high number of states in the key_value table, and the change record β†’ suggested this).

    So far, it is too early to tell if this will help, but we will let you know if the problem recurs.

    We have only seen the failure 2 times in the 5 days since we upgraded to 10.3. The client has monitoring enabled on the homepage, so we are pretty sure it has only been the 2 times.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    Oh and we aren't sure if clearing caches will fix the issue. So far, our team has just saved the english translation of the homepage node to correct the problem when it happened. We've asked everyone to try flushing caches first if it recurs, because we are curious about that.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    We had another occurrence of the 404 error on the homepage about 2 hours ago.

    So it looks like the state change had no affect on this issue.

    We were able to correct things by flushing/rebuilding cache.

  • πŸ‡¬πŸ‡§United Kingdom catch

    #1491748: Crawlers hitting /%20 can replace the home page with a 404 β†’ is an old IIS specific issue but shows a 404 request essentially replacing the front page cache.

    Another question for people hitting this - do you have browser language detection enabled? In which case it would be πŸ› Browser language detection is not cache aware Needs work

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    We aren't using Browser detection.

    That first issue doesn't seem related (d7). In any case, I tried the repo steps here β†’ and couldn't create the problem that way.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    One thing all 3 of our incidents have had in common, is they occurred when the site was relatively lightly visited. The site gets a lot of traffic, but this occurred when it wasn't getting much traffic.

    Could just be a coincidence, I suppose.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA
  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    I got a hold of a copy of the database taken shortly before the latest homepage 404 happened.

    In the cache data table, I see the culprit...

    Once caches are flushed, this bad actor is eliminated. We still can't figure out how this bad actor gets added to cache in the first place.

    The :/%20 entry beneath the top :/ entry does seem to support @Catch's suggestion of this issue β†’ being related, but even if I enter myulr.com/%20 on my site, I can't make the problem happen.

    In case it is helpful, here is the bad actor's blob data:

    a:3:{s:4:"path";s:2:"/ ";s:5:"query";a:0:{}s:6:"routes";O:41:"Symfony\Component\Routing\RouteCollection":4:{s:49:"Symfony\Component\Routing\RouteCollectionroutes";a:4:{s:8:"<button>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:7:"<front>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:1:{s:6:"_title";s:4:"Home";}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:3:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:8:"<nolink>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:6:"<none>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}}s:50:"Symfony\Component\Routing\RouteCollectionaliases";a:0:{}s:52:"Symfony\Component\Routing\RouteCollectionresources";a:0:{}s:53:"Symfony\Component\Routing\RouteCollectionpriorities";a:0:{}}}

    The following is the blob data from the :%20 entry:

    a:3:{s:4:"path";s:2:"/ ";s:5:"query";a:0:{}s:6:"routes";O:41:"Symfony\Component\Routing\RouteCollection":4:{s:49:"Symfony\Component\Routing\RouteCollectionroutes";a:4:{s:8:"<button>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:7:"<front>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:1:{s:6:"_title";s:4:"Home";}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:3:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:8:"<nolink>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:6:"<none>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}}s:50:"Symfony\Component\Routing\RouteCollectionaliases";a:0:{}s:52:"Symfony\Component\Routing\RouteCollectionresources";a:0:{}s:53:"Symfony\Component\Routing\RouteCollectionpriorities";a:0:{}}}

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    Our issue happened once again at 6 PM on Sunday. The same error and 404 occurred on the home page. Flushing the cache brings the page back. I did notice in the logs the automated cron was run right before it occurred.

    We do not have browser language detection enabled.

  • πŸ‡¬πŸ‡§United Kingdom catch

    @banoodle can you double check the state of the database when there's no 404 - i.e. that you definitely don't have the %20 cache entry, and that it's the only difference from the working state.

    It would help if other people on this issue could take the same steps banoodle has taken too.

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    I ran the same query @banoodle did (below) but it returned an empty result because I ran it after I flushed the cache. I know, not very helpful.

    SELECT * FROM `cache_data` WHERE `cid` LIKE '%route"[language]=en:[query_parameters]=:/' ORDER BY `cache_data`.`cid` ASC

    If I enter myulr.com/%20 on my site, I get the same error in my log, but no 404 on the home page.

    I have a dev site with almost the same setup and the issue is not occurring there. Caching and CSS/JS aggregation are enabled on both. I am trying to identify all the differences. One difference (which I don't think is preventing the issue) is my dev site is using the Search 404 module which will send the user to my custom Search page (using Solr Search) when a 404 occurs.

    This is hard to reproduce and I'm not great at debugging. I have read through the related issues to see if I can get any useful info.

    For now, my non-ideal temporary fix (fingers crossed) is to set up a cron job to clear the Drupal cache every 6 hours, 5 minutes after the automated Drupal cron is scheduled.

    cd /var/www/site/html && /var/www/site/html/vendor/bin/drush --uri=https://siteurl cr

    Other temporary fixes I've considered:

    -- Add a rewrite rule to strip out trailing spaces (and trailing %20)

    -- Exclude caching for the front page using the Cache Exclude module β†’ or another way

  • Status changed to Needs work 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    I can't reproduce the bug as such, but had a look at some things.

    First I had a quick look at what it would take to sanitize the incoming URL to remove the trailing %20before we do any routing.

    Symfony doesn't let you replace the request URI in the Request object, which would mean either replacing the entire object in a middleware, or in index.php before the request object is created. I had a look at doing the latter, and it's possible, but it means that instead of example.com/%20 being a 404 (which it should be), it just gets treated the same as the front page (which makes sense because we'd be normalizing it to the same thing), but that doesn't seem like what we want at all.

    I checked an 11.x Umami install, and route lookup does get cached for invalid paths, so the caching itself is not the problem, but %20 does return something for the route collection differently to %20abcde - even if that doesn't lead to a 404 on the front page most of the time, it's a bug either way.

    Then I had a look at the cache_data tables (this is with Umami installed, so a multilingual site).

    e.g.

    | route:[language]=en:[query_parameters]=:/%20        | a:3:{s:4:"path";s:2:"/ ";s:5:"query";a:0:{}s:6:"routes";O:41:"Symfony\Component\Routing\RouteCollection":4:{s:49:" Symfony\Component\Routing\RouteCollection routes";a:4:{s:8:"<button>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:7:"<front>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:1:{s:6:"_title";s:4:"Home";}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:3:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:8:"<nolink>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}s:6:"<none>";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:1:"/";s:4:"host";s:0:"";s:8:"defaults";a:0:{}s:12:"requirements";a:1:{s:7:"_access";s:4:"TRUE";}s:7:"options";a:4:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:8:"_no_path";b:1;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:20:"access_check.default";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:8:"{^/$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:1:"/";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:1:"/";s:8:"numParts";i:1;}}}s:50:" Symfony\Component\Routing\RouteCollection aliases";a:0:{}s:52:" Symfony\Component\Routing\RouteCollection resources";a:0:{}s:53:" Symfony\Component\Routing\RouteCollection priorities";a:0:{}}} |
    
    vs.
    
    <code>
    
    | route:[language]=en:[query_parameters]=:/%20abc     | a:3:{s:4:"path";s:5:"/ abc";s:5:"query";a:0:{}s:6:"routes";O:41:"Symfony\Component\Routing\RouteCollection":4:{s:49:" Symfony\Component\Routing\RouteCollection routes";a:0:{}s:50:" Symfony\Component\Routing\RouteCollection aliases";a:0:{}s:52:" Symfony\Component\Routing\RouteCollection resources";a:0:{}s:53:" Symfony\Component\Routing\RouteCollection priorities";a:0:{}}} 
    

    I then noticed we have the PathProcessorDecode class in core, this will decode %20 to a space, which then gets rtrimmed by RouteProvider.

          $path = $this->pathProcessor->processInbound($path, $request);
          $this->currentPath->setPath($path, $request);
          // Incoming path processors may also set query parameters.
          $query_parameters = $request->query->all();
          $routes = $this->getRoutesByPath(rtrim($path, '/'));
    
    

    That rtrim leads to a mismatch between $this->currentPath (which will have the decoded space) and the path passed to ::getRoutesByPath().

    If we trim in PathProcessDecode, then routing finds the front page - this seems wrong, but would probably workaround the issue here. I've added an MR for this, first to see whether we have any test coverage it would break, secondly for people to try out if they want.

    | route:[language]=en:[query_parameters]=:/%20 | a:3:{s:4:"path";s:5:"/node";s:5:"query";a:0:{}s:6:"routes";O:41:"Symfony\Component\Routing\RouteCollection":4:{s:49:" Symfony\Component\Routing\RouteCollection routes";a:1:{s:21:"view.frontpage.page_1";O:31:"Symfony\Component\Routing\Route":9:{s:4:"path";s:5:"/node";s:4:"host";s:0:"";s:8:"defaults";a:5:{s:11:"_controller";s:47:"Drupal\views\Routing\ViewPageController::handle";s:15:"_title_callback";s:49:"Drupal\views\Routing\ViewPageController::getTitle";s:7:"view_id";s:9:"frontpage";s:10:"display_id";s:6:"page_1";s:30:"_view_display_show_admin_links";b:1;}s:12:"requirements";a:2:{s:11:"_permission";s:14:"access content";s:7:"_format";s:4:"html";}s:7:"options";a:8:{s:14:"compiler_class";s:33:"Drupal\Core\Routing\RouteCompiler";s:18:"_view_argument_map";a:0:{}s:23:"_view_display_plugin_id";s:4:"page";s:26:"_view_display_plugin_class";s:38:"Drupal\views\Plugin\views\display\Page";s:30:"_view_display_show_admin_links";b:1;s:16:"returns_response";b:0;s:4:"utf8";b:1;s:14:"_access_checks";a:1:{i:0;s:23:"access_check.permission";}}s:7:"schemes";a:0:{}s:7:"methods";a:2:{i:0;s:3:"GET";i:1;s:4:"POST";}s:9:"condition";s:0:"";s:8:"compiled";O:33:"Drupal\Core\Routing\CompiledRoute":11:{s:4:"vars";a:0:{}s:11:"path_prefix";s:0:"";s:10:"path_regex";s:12:"{^/node$}sDu";s:11:"path_tokens";a:1:{i:0;a:2:{i:0;s:4:"text";i:1;s:5:"/node";}}s:9:"path_vars";a:0:{}s:10:"host_regex";N;s:11:"host_tokens";a:0:{}s:9:"host_vars";a:0:{}s:3:"fit";i:1;s:14:"patternOutline";s:5:"/node";s:8:"numParts";i:1;}}}s:50:" Symfony\Component\Routing\RouteCollection aliases";a:0:{}s:52:" Symfony\Component\Routing\RouteCollection resources";a:0:{}s:53:" Symfony\Component\Routing\RouteCollection priorities";a:0:{}}} |     -1 | 1721118866.701 |          1 | route_match | 17       |
    

    What I think we should probably do is not decode the path at all, putting up an MR for that to see what happens to test coverage. If tests pass with that, it would also be useful to see if people could try applying that change as well.

    Full steps to reproduce from a clean install would be really useful here, but I think we've got enough to go on here to make some progress.

  • Merge request !8787#3456244: trim the path after decoding. β†’ (Open) created by catch
  • Pipeline finished with Failed
    2 months ago
    Total: 177s
    #225377
  • πŸ‡¬πŸ‡§United Kingdom catch

    I didn't realise this was against 10.3.x so have put up two 10.3 MRs..

    Neither were fully done anyway, just to get a test run with different approaches, so I've add an additional 11.x MR which includes a deprecation in 10.4.x: https://git.drupalcode.org/project/drupal/-/merge_requests/8789

    If that really doesn't break anything, we could potentially backport to 10.3 (without the deprecation).

    If we only want to handle the front page in 10.3 and save the behaviour change for a minor release, we might want to keep exploring both approaches.

  • Pipeline finished with Success
    2 months ago
    Total: 784s
    #225365
  • Pipeline finished with Failed
    2 months ago
    Total: 864s
    #225367
  • Merge request !8790Trim the path after decoding. β†’ (Open) created by catch
  • Pipeline finished with Failed
    2 months ago
    Total: 173s
    #225385
  • πŸ‡¬πŸ‡§United Kingdom catch

    OK not decoding fails a lot of tests, so https://git.drupalcode.org/project/drupal/-/merge_requests/8790 is the one to look at - just adding a trim.

  • Pipeline finished with Success
    2 months ago
    Total: 470s
    #225389
  • πŸ‡¬πŸ‡§United Kingdom catch

    catch β†’ changed the visibility of the branch 3456244-after-updating-drupal to hidden.

  • πŸ‡¬πŸ‡§United Kingdom catch

    catch β†’ changed the visibility of the branch 3456244-no-decode to hidden.

  • πŸ‡¬πŸ‡§United Kingdom catch

    catch β†’ changed the visibility of the branch 3456244-trim-path to hidden.

  • Status changed to Needs review 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch
  • Pipeline finished with Failed
    2 months ago
    #226045
  • πŸ‡¬πŸ‡§United Kingdom catch
  • Pipeline finished with Success
    2 months ago
    Total: 1167s
    #226049
  • Status changed to Needs work 2 months ago
  • πŸ‡³πŸ‡ΏNew Zealand quietone

    Maybe a change of title to "Normalize the incoming path with urldecode directly in RouteProvider"

    I left a question in the MR, so setting to NW.

  • Status changed to Needs review 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    New title is good.

    I also tried to address the MR feedback.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Has anyone been able to test the MR on a site experiencing this issue?

  • Status changed to Needs work 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Added some code comments, I think the code could be cleaner and there is an open question about the cache key.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Accepted the suggestion and I'm not sure about the open question because we need to backport this to a patch release.

  • Pipeline finished with Success
    2 months ago
    Total: 590s
    #228832
  • Status changed to Needs review 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Back to needs review because I'm not sure what to do about #43.

    I think we could add the extra parameter in 11.1 in a separate MR, and then the existing approach (probably with forward slashes trimmed too) for the backport branches.

    Or go ahead here and open a follow-up to add the extra parameter in 11.1

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Maybe we just copy the code to both methods for now, and consider unifying it later?

  • πŸ‡¬πŸ‡§United Kingdom catch

    Copied the code more properly, added a todo and a follow-up issue at πŸ“Œ Add path as a parameter to getRouteCollectionCacheId() Active .

  • Pipeline finished with Success
    2 months ago
    Total: 478s
    #229061
  • Status changed to RTBC 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Looks good to me, plus gives a tiny performance improvement in every page load given we are one event listener lighter.

    It would be really helpful if one of the reporters in this issue can test the patch but in the meantime this is RTBC.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    @fkildoo, @rbomhof or @banoodle - any chance you could try the RTBC MR for a few days and report back if it fixes your issue?

    Updating issue credits

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    @larowlan - Sorry, you all lost me a while ago! I don't have any experience with or knowledge of testing a merge request. I know how to implement a patch with Composer so I'm currently reading up on how to download and use a patch file from a merge request β†’ . If successful, I will test it and report back.

    On another note, this error has occurred 18 times starting on 7/12. Below are the frequency and times it has occurred, if that is helpful.

    2024-07-22 13:45:26.248
    2024-07-22 11:24:41.247
    2024-07-20 11:48:51.248
    2024-07-15 13:15:26.248
    2024-07-14 23:12:35.248
    2024-07-14 23:12:30.253
    2024-07-14 23:12:15.248
    2024-07-14 23:10:22.248
    2024-07-14 23:00:52.248
    2024-07-14 22:00:51.248
    2024-07-12 19:15:05.441
    2024-07-12 11:32:16.441
    2024-07-12 11:28:39.149
    2024-07-12 11:28:34.934
    2024-07-12 11:24:59.982
    2024-07-12 11:20:46.981
    2024-07-12 11:00:48.981
    2024-07-12 10:00:46.982

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    @larowlan we can try it, but ever since we set state cache to true last Saturday morning (7/13/24), we haven't had a recurrence of the problem (so it has been over a week since the last incident).

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Thanks @banoodle

    @fkildoo - you can turn an MR into a patch by adding `.diff` to the URL - for this issue the patch is at https://git.drupalcode.org/project/drupal/-/merge_requests/8790.diff - however you shouldn't reference that URL directly (as anyone can get access to the branch). Instead you should download the file into e.g. a PATCHES folder inside your codebase, and then with composer patches reference it using a file URL (e.g. "./PATCHES/name-of-the-saved-file.patch". Thanks again πŸ’ͺ

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    @larowlan Thanks, I figured it out and tested it on my dev site. Will apply to production tonight and report back if I get the error again.

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    I applied the patch to my production site. So far so good. I've visited all the URLs below (with the site URL in front) and they all redirect to my home page without a 404 or an error in my log. I'll keep monitoring my logs for the error.

    /%20
    /index.php/
    /

  • πŸ‡¬πŸ‡§United Kingdom John_B London (UK), Worthing (UK), Innsbruck (Tirol)

    I have seen the issue on a site which never was bilingual, and uses language undefined. Not sure what triggered it.

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    @larowlan looks like I spoke too soon: we had an incident last night. This was with state_cache set to true. We had gone 10 days since the last incident.

    We have applied this issue's patch and will let you know if we have another incident.

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    Just an update... 3 days since applying the patch and there have been no errors. I assume the patch did the trick, but the last time there were 5 days between errors, so I can't be 100% sure yet. @banoodle went over 7 days between errors. If we both go over 8 days without seeing the error, I'd say it's probably good. I'll report back in a few days.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Thanks

  • Status changed to Needs work about 2 months ago
  • πŸ‡ΊπŸ‡ΈUnited States xjm

    Posted a comment on the MR. The deprecation for this should be targeting 11.1. If it gets backported to Drupal 10, that should happen without any added deprecations. No?

  • Status changed to Needs review about 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡΅πŸ‡°Pakistan Allahnoor Turab

    Thanks @larowlan, I applied the patch #51 on multilangual production site yesterday, so for gives positive result.

  • πŸ‡΅πŸ‡°Pakistan Allahnoor Turab

    Thanks @larowlan, I applied the patch #51 on multilangual production site yesterday, so for gives positive result.

  • Status changed to RTBC about 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    Since the deprecation change was minimal, moving back to RTBC.

  • πŸ‡¬πŸ‡§United Kingdom John_B London (UK), Worthing (UK), Innsbruck (Tirol)

    Shouild be "Critical". Our monolingual site is randomly returning 404s for key landing pages. We will be grateful for a fix in D10, where patch does not apply cleanly.

  • Pipeline finished with Success
    about 2 months ago
    #238917
  • Status changed to Needs work about 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Unfortunately I'm not sure that this fix is going to resolve the underlying issue as reported initially here. I think the problem is due to case-insensitive queries in MySQL - see πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active - that issue also suggests a work around but we need to fix the underlying issue.

  • Status changed to Needs review about 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    I think we should work on πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active separately to the bug discovered here. We don't know exactly what caused the 10.3.0 issue, and both predate 10.3.0.

    For clarification, the bug I found here is that a request to https://example.com/%20 matches routes like this even though it ends up serving a 404 response:

    '<none>':
      path: ''
      options:
        _no_path: TRUE
      requirements:
        _access: 'TRUE'
    
    '<nolink>':
      path: ''
      options:
        _no_path: TRUE
      requirements:
        _access: 'TRUE'
    
    '<button>':
      path: ''
      options:
        _no_path: TRUE
      requirements:
        _access: 'TRUE'
    
    

    Whereas a request to https://example.com/ matches the front page, and a request to https://example.com/%20abc is a proper 404. But it will probably be a week or so before we can confirm whether it fixes the bugs reported here or not due to the bug itself being so intermittent.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Can people who have the problem originally reported here confirm the following details:

    1. What database you using - including version - you can get this information on admin/reports/status
    2. If you connect to your database run the query select cid from cache_page limit 1; and then use the cid returned to a select where you have changed the case of the cid select cid from cache_page where cid = 'cid_case_changed';
    3. The output of the SQL statement show create table cache_page; (mysql) or the CLI command pg_dump --table cache_page --schema-only (postgres)

    Thanks for helping! Eventually this will help the other issue.

  • πŸ‡¬πŸ‡§United Kingdom catch

    MR up on πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active now, quick summary of the differences between two issues:

    1. The bug I found here (which may or may not fix the reported issue) is that the routing system returns different results for example.com, example.com%20 and example.com%20abcd, and the result for example.com/%20 is completely invalid - the client gets a 404, but the routes returned include 'fake' routes defined with an empty string for things like <button>. This is partly at least because the routing system inconsistently handles trailing whitespace in URLs.

    2. πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active is for when a literal trailing space is passed to the database cache backend as part of the cache ID, .e.g. |foo |. MySQL conditions on varchars can (but not always, e.g. LIKE doesn't) treat |foo| and |foo | as the same, meaning that |foo | returns a cache item when it shouldn't, but then breaks the cache backend logic that reconciles the PHP cache IDs to what's stored in the database. The MR there ensures that cids with trailing spaces get hashed so that they're treated differently.

    These are somewhat related in that they both deal with trailing spaces, but separate bugs in between the routing and cache systems.

    It seems like there is some kind of potential scenario where a URL like |example.com%20| gets URL encoded to |example.com | which then gets passed to the database cache backend, but I can't yet see how that would happen in the routing system at least.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Very happy to see all the new information and understanding. Feel like we're getting to a decent understanding. I guess the question we should ask here is whether or not we should be rtrim'ing paths. Is this expected behaviour? Also I'm pondering about path column in the router table and how lookup with trailing spaces should work on that. Ah interesting. It's sensitive to trailing spaces - this is because it is varchar... the cid column is varchar binary - it's the binary part that makes it case sensitive but not right padding sensitive. FUN.

    I'm a bit conflicted with the fix here though. I think returning a 404 for http://example.com/%20 is better than normalising. But that's a gut feel rather than anything backed by hard evidence. This change would normalise it to homepage.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    I've tested a few sites... wordpress seems to normalise %20 and rtrim... see https://www.nasa.gov/%20 and other wordpress sites... but other big websites like google, amazon all 404.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ
  • πŸ‡ΊπŸ‡ΈUnited States thejimbirch Cape Cod, Massachusetts

    Ug.

    I’ve made my team aware of the request in #65. banoodle or rbomhof will post an update.

  • πŸ‡¬πŸ‡§United Kingdom catch

    I'm a bit conflicted with the fix here though. I think returning a 404 for http://example.com/%20 is better than normalising. But that's a gut feel rather than anything backed by hard evidence. This change would normalise it to homepage.

    I have a slight preference for this too, but technically doing it the opposite way around opens up a massive can of worms.

    The URL decode in urldecode() in PathProcessorDecode::processInbound() naturally strips the encoded space from the end of the URL.

    We would need to explicitly detect %20 at the end of a URL, and then either add it back after url decoding (which is not really url decoded then), or set the current path to the path with whitespace at the end, pass it into ::getRoutesByPath() etc. all of which is not a valid URL at all - i.e. you can't have an actual URL with a literal space at the end of it, that's invalid.

    That means the only real alternative is to explicitly add back the %20 after URL decoding:

    diff --git a/core/lib/Drupal/Core/PathProcessor/PathProcessorDecode.php b/core/lib/Drupal/Core/PathProcessor/PathProcessorDecode.php
    index a74b1461b2..1ff7cd9896 100644
    --- a/core/lib/Drupal/Core/PathProcessor/PathProcessorDecode.php
    +++ b/core/lib/Drupal/Core/PathProcessor/PathProcessorDecode.php
    @@ -23,6 +23,12 @@ class PathProcessorDecode implements InboundPathProcessorInterface {
        * {@inheritdoc}
        */
       public function processInbound($path, Request $request) {
    +    if (str_ends_with($path, '%20')) {
    +      // URLs can't end with a trailing whitespace, but also we don't want to
    +      // normalize the URL to be identical to a URL with the whitespace trimmed,
    +      // so keep the literal '%20' at the end.
    +      return urldecode($path) . '%20';
    +    }
         return urldecode($path);
       }
    
    

    I guess I can make an extra MR that does that instead, and see what the pipeline looks like.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Yeah so the problem if we do that, is that we then immediately run into πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active - i.e. if you hit example.com/%20 first, you get a 404, and the cache key that gets stored is for example.com/, so that gets a 404 too. But cane make it work if we merge πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active in. I updated the new tests here, but deliberately didn't run any other tests so we can get a clear idea of what else breaks, if anything.

  • Merge request !9046Don't trim trailing whitespace β†’ (Open) created by catch
  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    @alexpott in response to #65:

    1. My database is MySQL v8.0.39
    2. Running the query "select cid from cache_page limit 1;" returns 1 result (https://www.example.com//blog:), but I'm unclear on the rest of your request -- "...use the cid returned to a select where you have changed the case of the cid select cid from cache_page where cid = 'cid_case_changed';"
    3. ?

    Since applying the patch I haven't received the error. My Drupal version is 10.3.1.

  • πŸ‡¬πŸ‡§United Kingdom catch

    https://git.drupalcode.org/project/drupal/-/merge_requests/9046 is green and is less ugly than I thought - we just have to restore the %20 at the end of the URL after urldecode() and incorporate πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active .

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    @alexpott thank you for working on this. Here are the requested details from our site:

    Database: 10.4.25-MariaDB-log

    The first query returns a cid of: https://www.rescue.org/eu/search-eu/site/Child%20protection?page=16:

    If I change the case and run this query, it returns no results:

    select cid from cache_page where cid = 'https://www.rescue.org/eu/search-eu/site/child%20protection?page=16:';

    Here is the output from show create table cache_page;:

  • πŸ‡ΊπŸ‡ΈUnited States fkildoo

    Add on to my comment #74:

    2. I ran the query: "select cid from cache_page where cid = 'https://www.example.com//blog:';" which returned the same result as the first part.
    3. Output of show create table cache_page:

    mysql> show create table cache_page;
    +------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Table      | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
    +------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | cache_page | CREATE TABLE `cache_page` (
      `cid` varchar(255) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '' COMMENT 'Primary Key: Unique cache ID.',
      `data` longblob COMMENT 'A collection of data to cache.',
      `expire` bigint NOT NULL DEFAULT '0' COMMENT 'A Unix timestamp indicating when the cache entry should expire, or -1 for never.',
      `created` decimal(14,3) NOT NULL DEFAULT '0.000' COMMENT 'A timestamp with millisecond precision indicating when the cache entry was created.',
      `serialized` smallint NOT NULL DEFAULT '0' COMMENT 'A flag to indicate whether content is serialized (1) or not (0).',
      `tags` longtext COMMENT 'Space-separated list of cache tags for this entry.',
      `checksum` varchar(255) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL COMMENT 'The tag invalidation checksum when this entry was saved.',
      PRIMARY KEY (`cid`),
      KEY `expire` (`expire`),
      KEY `created` (`created`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Storage for the cache API.' |
    +------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.00 sec)
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    @banoodle and @fkildoo thanks for the info - we found the bug in the cache layer and it has been fixed and will released in the next non-security releases of Drupal 10 and 11. See πŸ› "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get Active . I'm pretty sure that's going to fix the issue that was originally reported here. This issue will now address the bug that @catch found when doing http://example.com/%20

  • Status changed to Postponed: needs info about 2 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Can anyone confirmed if the issue mentioned in #78 addressed the issue here please?

  • πŸ‡ΊπŸ‡ΈUnited States banoodle San Francisco, CA

    @smustgrave I'm not sure how we could confirm it. Since applying the MR on this issue over 2 weeks ago, there haven't been any further incidents.

    We have never been able to reproduce the 404 on the homepage on-demand, so I think the only way we could validate this, would be to rollback the MR fix, apply the patch for the other issue and then wait to see if it happens again. The longest error-free interval we had when we still had the problem was 10 days.

    Also we are on 10.3.1 and it looks like they haven't provided a patch for that yet (but from the latest comment it looks like one may be forthcoming soon). If it is, would we want to apply that on top of the MR from this issue?

  • Status changed to Needs review about 2 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    @smustgrave 'needs more info' isn't particularly helpful here, we know there's a bug in the current logic by reading it and the contents of the router cache with the steps to reproduce in the test coverage. What we don't know is whether this issue or the other bug report recently fixed definitively fixes the original 404 issue people have been seeing, but since it's so hard to reproduce, I personally think we should go ahead with the issues individually, and then ask people to report back if they ever see the bug again. If they don't, then great we fixed it, if they do, then we might have yet a third bug to track down, but the two MRs definitely fix bugs, even if we can't guarantee it's what fixes the original 404 report.

    @banoodle https://www.drupal.org/project/drupal/releases/10.3.2 β†’ includes the database cache backend fix for the other issue. If you update to that, and never get the problem again, that issue by itself may have been enough to fix it.

    The diff from the MR here should apply to 10.3.x I think, but would probably need to wait and see whether you get the issue again on 10.3.2 first, then apply the change from here, then wait again to see if it re-occurs.

  • I just upgraded to 10.3.2 and still see this issue. My site is English, not bilingual. It seems to affect one set of content page types, but not others, but they're identical except for the block layout.

  • πŸ‡¬πŸ‡§United Kingdom catch

    It seems to affect one set of content page types, but not others, but they're identical except for the block layout.

    The bug has so far only been reported for the front page on multi-lingual sites - are you getting 404s on other pages? Could use a bit more information.

  • Yes, I've got one content type set to display author bio pages. It's a simple node with the following field types:

    image, markup, body (text formatted, long, with summmary), 2 text (plain), and link field

    It does have a views EVA footer, though, so I'm looking now to see if that might be the cause.

  • Hmm, so the error disappeared in production. I cloned production back into dev (I was overdue for a cleanup reset anyway), and it cleared. I'm going to assume it persisted due to a legacy dev experiment that no longer exists.

    But, yes, the issue did affect pages for me other than the home page. I also had a view last week that suddenly failed and triggered the same warning.

    I'll keep an eye on things and will post if I see it again.

  • Aaaand, it's back. Again, this is on a node page, not the front page. I was building the page directly from the node content. When I rebuilt it in Layout with view blocks and a contextual filter instead, the error cleared. Since this is a routing problem, I'm guessing it has something to do with passing the parameter from the URL to pull the correct content directly from the node.

    I tried stripping out the EVA and anything other non-standard fields. Same issue.

    For what it's worth, I'm also on Pantheon.

  • πŸ‡¬πŸ‡§United Kingdom John_B London (UK), Worthing (UK), Innsbruck (Tirol)

    @catch

    I have seen the bug on a monolingual site, on pages with content type='home' which are not front page. However I do not know if it was a white space bug or a %20 bug. I am not aware of any occurrence since applying the white space patch.

    The problem node had langcode 'und'. Nodes migated from D5/6/7 have langcode 'und'. Nodes created post-migration have langcode 'en'. In some cases nodes with langcode 'und' redirect to nodes with langcode 'en'.

  • Status changed to Needs work 18 days ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Can Steps to reproduce section be filled out please

  • πŸ‡¬πŸ‡§United Kingdom catch
  • Status changed to Needs review 18 days ago
  • πŸ‡¬πŸ‡§United Kingdom catch
Production build 0.71.5 2024