Feature or Bug? - The path-alias preloading is not used if the "Enforce clean and canonical URLs" option is enabled

Created on 10 February 2023, about 2 years ago

Steps to reproduce

I used a drupal 9.5 installation, installed the path_alias and redirect modules, added two nodes and two aliases foo and bar for the nodes.

In the cache_data bin I noticed that if the "Enforce clean and canonical URLs" is activated that there are no cache items for the aliased paths.

with Enforce clean and canonical URLs the cache_data bin looks like:

w/o Enforce clean and canonical URLs it looks like (ignore the admin paths but the /node items definitely appear only in this scenario):

Question

Is this behavior intentional, and if yes why?

🐛 Bug report
Status

Active

Version

1.0

Component

Code

Created by

🇩🇪Germany Harlor Berlin

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @Harlor
  • 🇬🇧United Kingdom catch

    This seems major to me.

  • First commit to issue fork.
  • 🇨🇭Switzerland berdir Switzerland

    At first I thought I couldn't reproduce, because I did see some preload-path entries, for example from the frontpage.

    But it does (not) happen on other most regular pages. The problem seems to be the \Drupal\Core\Routing\UrlGeneratorInterface::generateFromRoute() call that happens in \Drupal\redirect\EventSubscriber\RouteNormalizerRequestSubscriber::onKernelRequestRedirect(), this calls into \Drupal\path_alias\AliasManager::getAliasByPath before setCacheKey() is being called. Which appens from PathAliasSubscriber, a Controller event, while we current run on request.

    This results in no preload happening, and $this->cacheNeedsWriting is not set to TRUE. Unlike other things, we don't update caches here, so we never set it to TRUE then afterwards.

    I'm unsure if PathAliasSubscriber could run earlier, but I guess the assumption is that it doesn't have to.

    So either we set the cacheKey ourself and replicate that, or we move the request normalizer event subscriber to after PathAliasSubscriber.

    Looking at the events, we are currently before dynamic page cache, so we'll need to make sure this doesn't cause any weird caching issues. We do things like removing index.php from the URL, so we do need to account for that. But if it works, it would have the additional benefit of being able to cache the checks that we are doing.

  • Merge request !130Set alias manager cache key → (Merged) created by berdir
  • 🇨🇭Switzerland berdir Switzerland

    Can't use controller event because that doesn't allow to set a response. Would need some weird trickery through a controller.

    Went with explicitly setting the cache key myself. Also weird that setCacheKey() isn't actually on the interface but is documented as such.

    One thing I'm wondering is how beneficial the preload path cache actually is these days, with things like render cache and dynamic page cache. I guess if you have many variations of dynamic page and render cache entries for the same URL? But if you for example have dynamic page cache variations and most things are a render cache hit, you likely load a ton of aliases that aren't actually needed? Or a different scenario, since the path alias cache is only set for 24h, you might run into a cache miss on such a partial cache hit and then only put a small amount of the paths into cache for another 24h?

    Seems like something that might vary depending on the site and can probably only be answered with database query monitoring on a production environment before/after this change. To see how much it reduces path alias lookups vs. the cost of cache lookup and the preload query.

    Pushed a MR, mostly untested and tests will need to be adjusted.

  • 🇬🇧United Kingdom catch

    One thing I'm wondering is how beneficial the preload path cache actually is these days, with things like render cache and dynamic page cache. I guess if you have many variations of dynamic page and render cache entries for the same URL? But if you for example have dynamic page cache variations and most things are a render cache hit, you likely load a ton of aliases that aren't actually needed? Or a different scenario, since the path alias cache is only set for 24h, you might run into a cache miss on such a partial cache hit and then only put a small amount of the paths into cache for another 24h?

    The preload cache item is only loaded when a path is requested (in PathAliasManager::getAliasByPath()), so if you have a dynamic page cache hit and there are no paths to render, it won't be loaded at all. Obviously this ceases to be the case as soon as one alias gets requested.

    It's definitely less useful with render caching but not sure we're there yet. The extreme case would be a similar case to d.o issues where you have a lot of updates to the same content like an issue - then the preload cache item will last across multiple render cache clears. On a site where it's going to be all dynamic page cache hits for 24 hours, then it's not adding anything extra.

  • 🇨🇭Switzerland berdir Switzerland

    Tested this a bit. In my case, with all render caching disabled, this replaces 15 alias lookup queries with the preload query (+ cache get). How often that happens like that is hard to say, but it's true that render cache can have quite a few variations, e.g. different permissions, languages and so on.

    Updated unit tests, but since this was conditional, no existing tests failed on this, now the expected call is defined.

  • 🇬🇧United Kingdom catch

    Opened 📌 Try to replace path alias preloading with lazy generation Active based on this discussion - might be away to keep the benefits of the preloading without the actual caching.

  • Pipeline finished with Skipped
    about 2 months ago
    #383850
    • berdir committed 192d972d on 8.x-1.x
      Issue #3340999 by berdir, harlor, catch: The path-alias preloading is...
  • 🇨🇭Switzerland berdir Switzerland

    Decided to merge this. The fix is a bit ugly, but costs us little and it should save a decent amount of queries in many cases. It's not up to redirect.module to decide whether or not this feature is beneficial. At worst it will need to be adjusted in the future if core decide to change how it works.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024