Allow different uri protocols for more effecient url generation

Created on 1 May 2020, about 4 years ago
Updated 18 March 2024, 3 months ago

Problem/Motivation

I have a site that's having trouble generating sitemaps. This may all be "known" but being as its breaking the module for me I thought I'd lay it out and document everything I found and maybe we can come up with a solution.

After spending some time in blackfire it looks like almost all the time is spent in this line eating 100s of megabytes of memory.

$link_url = Url::fromUri('internal:' . $link['loc'], $link_options + $url_options)->toString();

That looks like a simple enough piece of code. Just make the url into a string! Easy! Turns out internal urls are expensive.

in Url::fromInternalUri we find this pretty reasonable but more worrying line:

    $url = \Drupal::pathValidator()
      ->getUrlIfValidWithoutAccessCheck($uri_parts['path']) ?: static::fromUri('base:' . $uri_parts['path'], $options);

This method looks pretty normal too but then we see this weird bit of code:

    $request = Request::create('/' . $path);
    $attributes = $this->getPathAttributes($path, $request, $access_check);

Why do we need a request object? Well... getPathAttributes() is a beast.

Here's the meat of the method:

    $path = $this->pathProcessor->processInbound('/' . $path, $request);

    try {
      $request_context = new RequestContext();
      $request_context->fromRequest($request);
      $router->setContext($request_context);
      $result = $router->match($path);
    }

So we basically do a full routing of the request. And it gets worse because as we match the request for entities we trigger a full load of that entity. 10s of thousands of queries pulling in all the fields on your crazy node type just to build a path. Ouch!

Proposed resolution

Looking through some history, this seems to be kinda a quick naive port of the d7 code then patched to support translations. I do wonder if we can't just generate urls based on routes instead where possible. I can't see a reason 'loc' couldn't just have the internal URIs. We'd need to provide BC and upgrade paths but internal:/custom/url, route:/entity.node.canonical;node=123 would save a _lot_ of time generating URLs because route URI's are significantly cheaper to generate.

Remaining tasks

Decide on a path forward

User interface changes

n/a though it could possibly open some cool features to the UI in the future with the ability to more deeply integrate with url types.

API changes

Possibly different interaction with the 'loc' field.

Data model changes

New column or change to existing column in the database.

Release notes snippet

πŸ› Bug report
Status

Needs review

Version

2.0

Component

xmlsitemap.module

Created by

πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¬πŸ‡§United Kingdom james.williams

    Amazingly, we had a cron run last 19 minutes yesterday, 16 minutes of which was spent running xmlsitemap_cron()! We did have rather a lot of web traffic at the time, so I'm not 100% sure whether this was a symptom or cause, but it's still worth addressing either way. Our trace reports from New Relic show that the vast majority of that time was indeed calling Url::fromUri() so I can well imagine that completing the suggested improvement here could be a huge step forward.

    What's needed to advance this? The issue summary says "Decide on a path forward", but has that been done now?

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

    Don't know. We've been running this to great effect for close to 4 years now to and haven't seen a problem during that period and xmlsitemap never shows up on our cron metrics. Would be happy to help move this forward.

Production build 0.69.0 2024