Transliteration on path fields

Created on 24 October 2023, over 1 year ago

Problem/Motivation

Automatic translation includes the path field of nodes but transliteration is not run on the result. Thus, aliases containing spaces and special characters are created.

For example, on a German site about social benefits for families, the info page about "Elterngeld" got the custom alias "/elterngeld" (custom here means it is not automatically created by pathauto). When translating this to English, the url will be "/parental allowance". Translation is correct, but there is a space in the alias which should be a hyphen: "/parental-allowance".
For "Muttergeld", it's even a bit worse, the alias "/muttergeld" gets translated to "/mother's money", containing not only a space but also an apostroph.

The site in question is using DeepL translation service, but I guess that's not really relevant.

Steps to reproduce

Using a site with German as original language and English as secondary language and then applying the examples above, the behaviour can be seen. It's important to _not_ generate URL aliases automatically, as in that case the pathauto restriction kicks in and URL alias is excluded. Alias will be set after saving by pathauto depending on the defined pattern.

Proposed resolution

After translation is done on a path field, transliteration should be run and spaces be replaced by hyphens.

Disabling URL alias translation is not an option, as translated aliases should be allowed.
Excluding URL alias field from text extraction should also not be an option (is that possible with "path" being a base field? How would I apply thirdPartySettings?), as aliases actually shall be translated.

First question of course is, am I missing something? Maybe there is already an option or setting?

💬 Support request
Status

Active

Version

1.15

Component

Source: Node

Created by

🇩🇪Germany broon Potsdam

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @broon
  • heddn Nicaragua

    I was confused by this issue myself for a bit today. I finally traced it back. Transliteration only happens if pathauto is configured. If path auto is not configured, then transliteration does not apply. If you want to have transliteration on all path values that come back from your translation provider, you'll need to do a hook_entity_update and pass the path through transliteration.

  • 🇨🇭Switzerland berdir Switzerland

    Yes, only pathauto does transliteration, deduplication and so on, core doesn't do any of that, nor does TMGMT at the moment.

    We already integrate with pathauto to make the field non-translatable if that's enabled, but if pathauto isn't enabled, then those API's don't exist and could likely only be re-implemented in a much more simplified fashion.

  • heddn Nicaragua

    I think this is a especially conflictive area. CJK languages probably don't work with transliteration. A one size-fits-all solution likely doesn't exist. Not sure where to move this ticket. Won't fix will likely result in a new issue within a few months. But there's not much we can do about this either. Some type of readme or release notes are about the best ideas I can come up with.

Production build 0.71.5 2024