add < u > to allowed tags when importing translations

Created on 4 January 2025, 4 days ago

Problem/Motivation

Recently, in anticipation of the launch of Drupal CMS, I tried to translate all the modules available in the new standard Drupal download into German using localize.drupal.org. In the Leaflet module, I noticed some strings that were rejected when importing, even though they did not contain any syntax errors. What these strings have in common is that they all contain underlined text for which the opening HTML tag '< u >' and its closing counterpart '< /u >' are used. Could it be that the function locale_string_is_safe and the corresponding XXS filter list, which unfortunately I don't know in which function it is found, do not include the HTML tag for underlining text? The strings in question are, for example, https://localize.drupal.org/translate/languages/de/translate?page=1&proj... It would be nice if we could also include this HTML tag in the corresponding lists, as long as it does not contradict what the XXS cross-scripting filter is supposed to contribute in terms of security. This would ensure that no more errors are displayed when importing correctly formatted translations.

Steps to reproduce

Install Drupal CORE
Install the leaflet contributed module
Add German as second Language in your Installatiion
update and immport translations
See current log messages in the reports section
see import errors of locale
compare non imported strings they all must contain < u >

Proposed resolution

add the underline html tag to allowd tag list forlocalisazoion import in Drupal

Remaining tasks

change coe

User interface changes

none

Introduced terminology

-

API changes

TBD

Data model changes

TBD

Release notes snippet

TBD

🐛 Bug report
Status

Active

Version

11.1 🔥

Component

language system

Created by

🇩🇪Germany joachim namyslo Kulmbach 🇩🇪 🇪🇺

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @joachim namyslo
  • 🇩🇪Germany joachim namyslo Kulmbach 🇩🇪 🇪🇺
  • 🇩🇪Germany joachim namyslo Kulmbach 🇩🇪 🇪🇺
  • 🇩🇪Germany joachim namyslo Kulmbach 🇩🇪 🇪🇺
  • 🇩🇪Germany drubb Sindelfingen

    Hi Joachim,

    that's a difficult one! I'd rather suggest to file an issue against Leaflet to replace this markup with something better suited, e.g. < em >.
    IMHO using underlines is problematic markup anyway, as people are going to confuse it with links.
    In the meantime, I would drop this in the translations.

    Regards,
    Boris

  • 🇩🇪Germany joachim namyslo Kulmbach 🇩🇪 🇪🇺

    Of course, I could do that.

    However, this does not solve the problem described here in any way.

    This means that we have to leave texts untranslated because someone decided to use an HTML tag within their texts that they didn't know couldn't be imported once there was a translation for the string in question.
    I can of course delete the translations again. But that means that a contrib module that could actually be 100 percent translated is still only 95 per cent translated and that even in the event of an update by the maintainers of Leaflet on the translation server, we have strings that we shouldn't translate but that we can't easily get rid of without patching and re-publishing already published module versions. Otherwise, the original strings remain on the server until someone comes along who wonders why these harmless strings are still not translated and submits suggestions for the strings in question again.

    That's not a solution.

    I've been waiting since Drupal 8 for us to reach a point in the Drupal development cycle where only original strings that can be translated without hesitation end up on the translation server. This is one such case that could be used as an opportunity to revise the system.

    The poor people who translate Drupal in their free time so that others can understand Drupal better and the learning curve of the system is somewhat flatter can't help it if we don't have a mechanism that checks whether all original strings of a module can be translated without hesitation before a new module release in the build pipeline.

    However, that would be a topic for a much more complex issue.

    You know, I would be happy to follow your suggestion to delete the translations again, but I'm afraid that we would then need another 8 years before this problem even comes to light again.

    We need to scan the translations for non-importable character strings and incorporate appropriate redundancies to ensure that there are no more incorrectly written original character strings.
    For this reason, I would like to keep this issue open so that we can solve it once and for all.

  • 🇬🇧United Kingdom catch

    I think this probably needs a phpcs or phpstan rule which validates the content of every translatable string. Worth opening an issue against https://github.com/mglaman/phpstan-drupal for that.

    But also I think locale_string_is_safe should include u in the allowed tags, because it's still a valid HTML tag, see https://developer.mozilla.org/en-US/docs/Web/HTML/Element/u

  • 🇳🇿New Zealand quietone

    Changes are made on on 11.x (our main development branch) first, and are then back ported as needed according to the Core change policies .

  • 🇦🇹Austria Grienauer Vienna

    We discussed this issue a bit at our dugmuc meetup.
    let me write together here, what we discussed:

    • - Main problem is, that currently the whole text part, which is surrounded by the u is removed.
    • - We think the u should NOT be allowed. even if it is a html element.
      https://developer.mozilla.org/en-US/docs/Web/HTML/Element/u

      Along with other pure styling elements, the original HTML Underline (u) element was deprecated in HTML 4; however, (u) was restored in HTML 5 with a new, semantic, meaning: to mark text as having some form of non-textual annotation applied.

    • - Questions we had: which html elements are allowed and which not.
    • - Is it possible to just remove the u and not the text inside. which helps for the most problems maybe currently occur

    Todo:

    • - discuss if it is possible to only remove not supported html tags and not the text inside
    • - tell the leaflet maintainers, that the u tag is not sematically right used
    • - add a documentation to the global translation documentation, which html tags are used
    • - add some tests, so code between a t string is also checked and maintainers get an instant reply if html tags are not supported

    hope this feedback helps and gives another perspective.
    I think this task has a lot of side-quests one need to tackle ;P

  • 🇩🇪Germany joachim namyslo Kulmbach 🇩🇪 🇪🇺
Production build 0.71.5 2024