Unexpected language prefixes on sitemap index

Created on 25 June 2023, over 1 year ago
Updated 27 October 2023, over 1 year ago

Problem/Motivation

Background and context:

Problem statement:

  • The index sitemap.xml available at http://localhost/sitemap.xml contains links to the paginated sitemaps like illustrated below.
  • Observe that the loc elements contain a language prefix 'fi' like http://localhost/fi/sitemap.xml?page=1
  • When I follow this link which contains a language prefix, I'll get HTTP 404 page not found response.
  • If I manually remove the language prefix from the URL i.e. I access http://localhost/sitemap.xml?page=1, the paginated sitemap works as expected.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/sitemap_generator/default/sitemap.xsl"?>
<!--Generated by the Simple XML Sitemap Drupal module: https://drupal.org/project/simple_sitemap.-->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=1</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=2</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=3</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=4</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=5</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
</sitemapindex>

Steps to reproduce

See context above.

Proposed resolution

When sitemap.xml contains links to paginated sub-pages, ensure that the loc-elements do not contain language prefixes.

Remaining tasks

Investigate where the language prefixes are coming from.
Fix it.

User interface changes

API changes

Data model changes

๐Ÿ› Bug report
Status

Fixed

Version

4.0

Component

Code

Created by

๐Ÿ‡ซ๐Ÿ‡ฎFinland masipila

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @masipila
  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland masipila
  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland masipila

    Note to self: the index generation seems to happen in src/Plugin/simple_sitemap/SitemapGenerator/SitemapGeneratorBase.php

      public function getIndexContent(): string {
        [...]
        // Add sitemap chunk locations to document.
        for ($delta = 1; $delta <= $this->sitemap->fromUnpublished()->getChunkCount(); $delta++) {
          $this->writer->startElement('sitemap');
          // THE URL TO THE CHUNK IS CREATED HERE
          $this->writer->writeElement('loc', $this->sitemap->toUrl('canonical', ['delta' => $delta])->toString());
          // @todo Should this be current time instead?
          $this->writer->writeElement('lastmod', date('c', $this->sitemap->fromUnpublished()->getCreated()));
          $this->writer->endElement();
        }
      }
    
  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland masipila

    Okay, so getIndexContent() from the snippet mentioned in my previous comment calls SimpleSitemap::toUrl(), which is this (in src/Entity/SimpleSitemap.php)

      public function toUrl($rel = 'canonical', array $options = []) {
        if ($rel !== 'canonical') {
          return parent::toUrl($rel, $options);
        }
    
        $parameters = isset($options['delta']) ? ['page' => $options['delta']] : [];
        unset($options['delta']);
    
        if (empty($options['base_url'])) {
          /** @var \Drupal\simple_sitemap\Settings $settings */
          $settings = \Drupal::service('simple_sitemap.settings');
          $options['base_url'] = $settings->get('base_url') ?: $GLOBALS['base_url'];
        }
    
        $options['language'] = $this->languageManager()->getLanguage(LanguageInterface::LANGCODE_NOT_APPLICABLE);
    
        return $this->isDefault()
          ? Url::fromRoute(
            'simple_sitemap.sitemap_default',
            $parameters,
            $options)
          : Url::fromRoute(
            'simple_sitemap.sitemap_variant',
            $parameters + ['variant' => $this->id()],
            $options);
      }
    

    Looking at the routing file, the route normalizer is already disabled (suggested here: https://drupal.stackexchange.com/questions/246572/disabling-language-pre...)

    simple_sitemap.sitemap_default:
      path: '/sitemap.xml'
      defaults:
        _controller: '\Drupal\simple_sitemap\Controller\SimpleSitemapController::getSitemap'
        _disable_route_normalizer: 'TRUE'
      requirements:
        # Sitemaps are accessible for everyone.
        _access: 'TRUE'
    

    What would be the Right Way to ensure that the URL does not have a language prefix?

    (For the time being I wrote a small patch for myself that is included in my composer.json which has a hard coded "remove /fi/ prefix" logic but that's obviously not the correct way to handle this...)

    Cheers,
    Markus

  • ๐Ÿ‡ท๐Ÿ‡บRussia kala4ek ๐Ÿ‡ท๐Ÿ‡บ Novosibirsk

    Unforchanately it doesn't fully related to simple sitemap, because it was breaked at the core level, during ๐Ÿ› Missing url prefix on language neutral content Fixed tiket.

  • ๐Ÿ‡ช๐Ÿ‡ธSpain jjcarrion Spain

    Hi,

    I'm facing the same problem after updating core to 10.1.2.

    It seems that the root cause will take some time to be fixed https://www.drupal.org/project/drupal/issues/2883450#comment-15218088 ๐Ÿ› Missing url prefix on language neutral content Fixed so I have applied a hack for now, I'm agree with @masilipa that this is not the way to go, but until we find a better solution I'm uploading the hacky patch just in case anyone find it useful, I'm not even using dependency injection but as I said, this is not the right solution.

    Thanks!

  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland jheinon_finland

    Greetings,

    I was studying the issue and tried the patch on our Drupal project's Sitemap, and it didn't provide the desired result to remove the language prefix. The core version in the project in which I'm working is 10.1.2 and the module version is 4.1.x. I was studying the core issue to this and found a similar issue on OpenID Connect / OAuth client module's issue https://www.drupal.org/project/openid_connect/issues/3383036 โœจ Redirect URI has the language prefix in it (in D10) Needs review / Redirect URI has the language prefix in it (in D10).

    In this patch, the language for the URL is disabled, replacing it with `path_processing` set to false. And the same solution is proposed on the patch I'm providing with this comment on the issue, also with an interdiff to the prior patch.

  • Status changed to Needs review over 1 year ago
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    32 pass
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    32 pass
  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland heikkiy Oulu

    Encountered the same issue in OpenID Connect and Simple Sitemap XML. The patch from #7: offending "submit" link โ†’ seems to fix the issue. I'll move this to Needs review but probably also RTBC ready.

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia shreya shetty

    Shreya Shetty โ†’ made their first commit to this issueโ€™s fork.

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany sascha_meissner Planet earth

    +1 Having the same (drastic) issue, patch7 fixes this for me

  • Status changed to Fixed over 1 year ago
  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany gbyte Berlin

    Thank you, this fix I can live with. Tests are gone ATM as I need to set up GitLab CI. Can you guys meanwhile test the dev version for me and tell me if it nothing broke?

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024