Try to replace path alias preloading with lazy generation

Created on 28 December 2024, 9 days ago

Problem/Motivation

Inspired by @berdir in 🐛 Feature or Bug? - The path-alias preloading is not used if the "Enforce clean and canonical URLs" option is enabled Active .

Depends on 🌱 Adopt the Revolt event loop for async task orchestration Active .

We currently have a path preload cache, added by me and others in about 2008. It records all the system paths found on a page when the cache is cold, writes them to a per-URL cache object, then when that page is requested again, if a path alias is requested, uses that cache item to preload all the path aliases.

It saves a lot of queries but is less effective in 11.x than it was in 7.x because we now have render caching and dynamic_page_cache, meaning the amount of times we're generating the same path aliases on the same page will be lower (if still possibly more than once every 24 hours). Also it takes up quite a lot of cache space.

In 🌱 Adopt the Revolt event loop for async task orchestration Active @berdir suggested looking at potentially removing it - this would be fine on sites with good render cache hit rates, but sites with lots of content changes (comment posting on the same URL, node list cache tag constantly invalidated) might still do a fair bit of extra queries.

I think we might potentially have a way to do at least some of the multiple loading, without the caching - if so it would be best of both worlds.

If we look at an xhprof run on a cold cache page request, we can see that all/most of the alias lookups are from UrlGenerator::generateFromRoute()

UrlGenerator::generateFromRoute() builds the URL in two steps.

First it generates the 'system path' from the route. Then it calls PathProcessor::processOutbound() to apply outbound path processors which does amongst other things handling path aliases.

This is all very much per-URL-object and does not provide an obvious opportunity for path alias multiple loading - and while it's changed a fair bit, it's not dramatically different from the original problem we had in 2008.

However, what if we could register system paths onto the path processor - e.g. populate PathAliasManager::preloadedPathLookups as we generate system paths from routes.

Using revolt/Fibers, we could then suspend before running outbound path processors, and allow other code to execute, which potentially will generate more paths, and add them to preloadedPathLookups too. Then when we resume in UrlGenerator::generateFromRoute(), we'd end up calling PathAliasManager::getAliasByPath and that can then multiple load any path aliases that aren't static cached and have been added to preLoadedPathLookups().

Because of the way that e.g. 📌 Add PHP Fibers support to BigPipe RTBC was implemented, there is a non-zero chance that we'll be able to multiple load all path aliases that are rendered at the top level of block render arrays and similar.

No idea if this will actually work, but if it does:

1. It will work on completely cold caches, saving potentially dozens of database queries.

2. It won't require maintaining a separate per-page cache item.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

📌 Task
Status

Active

Version

11.0 🔥

Component

routing system

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch

    Instead of doing this in the url generator we can probably do it in AliasManager::getAliasByPath() instead.

    Add the system path to a class property ready for preloading, Fiber::suspend(), when we get resumed, look for any paths to multiple/preload that haven't already been, multiple load, continue.

    The main question is how many times can we ::suspend() before we resume, but we only need to pick up two path aliases each time to reduce the number of queries by 50% so the odds might be decent. The existing solution has built in staleness where subsequent page requests will individually lookup paths that weren't found when the preload cache was set.

  • 🇬🇧United Kingdom catch

    Put up a proof of concept here but it doesn't do anything useful on standard yet - almost nothing is rendered in a placeholder and that's the only inside-fiber rendering we do.

    So this will need 📌 Create placeholders for more things Active - at least a rebase of the MR there and some combined testing, in order to see whether it can be viable.

    I did add some views blocks (recent comments, recent content) to the footer along with adding some alias for things like comment/reply, and saw this working - so in theory it has potential just not in practice yet.

  • Merge request !10724Draft: Initial proof of concept. → (Open) created by catch
  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch

    Another attempt at a better issue title.

  • 🇬🇧United Kingdom catch

    OK this actually works.

    Manual testing with https://git.drupalcode.org/issue/drupal-3496369/-/tree/3496369-with-rend...

    With a completely cold cache on an Umami recipe page, there are 38 calls to Drupal\path_alias\AliasRepository::lookupBySystemPath + the cache get and set for the preload cache.

    With the MR, there are 31 calls to Drupal\path_alias\AliasRepository::preloadPathAlias and no cache get/set.

    With a warm render cache, there will be zero path alias gets at all because the resultant URLs are in the render cache anyway.

    So the worst case scenario is improved, and the best case scenario is the same.

    When the path alias cache is warm but the render cache is not (multiple roles viewing the same path for example), then there will probably be more database queries - but what's in the cache will depend on which role visits the page first, so it's not like preloading gets the queries down to one, there's always some stragglers.

  • 🇬🇧United Kingdom catch

    Then I added 📌 Create placeholders for more things Active locally and got down to 14 calls to preloadPathAlias() - this is nearly a 2/3rds reduction in path lookup queries on a cold cache. Because the cache is per-URL it will be a similar reduction across all the first visits to a URL on a site after a cache clear or each time the cache expires if the render cache for that page isn't warm."

  • Pipeline finished with Failed
    3 days ago
    Total: 146s
    #385377
  • Pipeline finished with Failed
    3 days ago
    Total: 118s
    #385380
  • Pipeline finished with Failed
    3 days ago
    Total: 106s
    #385390
  • 🇬🇧United Kingdom catch
  • Pipeline finished with Failed
    3 days ago
    Total: 572s
    #385404
  • Pipeline finished with Failed
    3 days ago
    Total: 104s
    #385418
  • Pipeline finished with Failed
    3 days ago
    Total: 454s
    #385422
  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch

    This issue + 📌 Create placeholders for more things Active seems to be enough without 📌 Render children in fibers Active , which is good because changing the renderer like that is a bit alarming.

    However when testing I ran into the exception that 📌 Deprecate RendererInterface::render()'s sole $is_root_call parameter Needs work is trying to remove, so we will need that issue too to make this all work together.

    The actual MR here still has a couple of test failures in path alias tests, but it'll be reviewable by itself without the other two issues - just won't be a performance improvement (or a very marginal one) until those also land.

    Not exactly sure what test coverage looks like for this - we can probably set up some Fibers in a kernel test that call getAliasByPath() and check that the aliases are still returned correctly.

    📌 Add assertions to OpenTelemetryNodePagePerformanceTest::testNodePageCoolCache() Active would hopefully cover performance testing once it's in, alongside the other two issues.

  • 🇬🇧United Kingdom catch

    catch changed the visibility of the branch 3496369-with-renderer-support to hidden.

Production build 0.71.5 2024