if t() string has no translation or fallback language, text should have lang attribute

Created on 22 May 2011, over 13 years ago
Updated 7 April 2023, over 1 year ago

Problem/Motivation

If there is no translation for a text string Drupal displays the untranslated string, or the text string from another fallback language. In both situations the text is in another language and so we need for wcag2 a language attribute in the html element around the text.

The problem with trying to implement this is that when t() or format_plural() is invoked, we don't know if HTML is generated. We cannot just output HTML wrapped text if what's generated is for an email or an XML format like RSS. We don't have that information, so we'd risk breaking all emails and RSS feeds generated by the site if we'd do.

Several solutions were discussed:
1. Handle every untranslated string as an error.
2. Make a smarter t()-function, that can accept html=true/false and/or that return an array (translated yes/no, plain/html, language)
3. add invisible unicode characters to the untranslated string and replace that while html rendering with language code

Ad 1: Not a solution anymore as we now also have language fallback for t() in core
Ad 2: this is proposed by Mike in #9, Gabors answered in #10 that this is hard or impossible to tell for module developers.

Proposed resolution

There are language tags in unicode, but they are depreciated (but not forbidden). No browser and text-to-speech-software will handle this unicode characters. However, we could use this unicode character as an helper and replace it for real language tags during rendering.
The English language tag in Unicode characters would be: U+E0001, U+E0065, U+E006E
End the end tag would be: U+E007F. During html rendering we could replace them with lang="en".

Remaining tasks

  • unicode language tag characters are 4-byte, and Drupal can't handle 4-byte unicode yet
  • test if we can replace the unicode characters with javascript to html
  • test if the language tags don't break email, rss, etc
  • check if we should filter the unicode characters away in certain functions

User interface changes

None

API changes

We will get extra invisible unicode characters in many places. From email to RSS. Some text functions might need adjustments.

---
See also the discussion in the accessibility group http://groups.drupal.org/node/145894

πŸ› Bug report
Status

Closed: won't fix

Version

10.1 ✨

Component
LocaleΒ  β†’

Last updated 12 days ago

Created by

πŸ‡³πŸ‡±Netherlands Hanno

Live updates comments and jobs are added and updated live.
  • Accessibility

    It affects the ability of people with disabilities or special needs (such as blindness or color-blindness) to use Drupal.

  • D8MI

    (Drupal 8 Multilingual Initiative) is the tag used by the multilingual initiative to mark core issues (and some contributed module issues). For versions other than Drupal 8, use the i18n (Internationalization) tag on issues which involve or affect multilingual / multinational support. That is preferred over Translation.

  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

  • Needs reroll

    The patch will have to be re-rolled with new suggestions/changes described in the comments in the issue.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States dww

    This came up as a random triage target for the #bugsmash initiative. Seems like this is still a bug, and it's still a tricky problem. At the bare minimum, we need to re-roll this for modern core (probably target 10.1.x for now, worry about backports later). I haven't exhaustively read the whole history in here to know if there's more needed (although I also see it's tagged for "Needs tests" and the existing test coverage in here is both very spares (only changes 2 lines) and is changing existing coverage, not adding new coverage.

  • πŸ‡ͺπŸ‡ΈSpain penyaskito Seville πŸ’ƒ, Spain πŸ‡ͺπŸ‡Έ, UTC+2 πŸ‡ͺπŸ‡Ί

    Read all comments. I was concerned too about what GΓ‘bor described in the first 12 comments, as t() misses the context for when something is rendered in html (and where) or other formats. But maybe now with delayed translation with TranslatableMarkup is worth a revisit. Still think this would be a DX issue and a big change only doable in a major release.

    For the current attached patch, looks like that covers only what is described in #2336491: if t() string has fallback language in another text direction, bidi should be added β†’ , so probably that should be moved there.

  • πŸ‡ͺπŸ‡ΈSpain penyaskito Seville πŸ’ƒ, Spain πŸ‡ͺπŸ‡Έ, UTC+2 πŸ‡ͺπŸ‡Ί

    Also a concern: #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) β†’ was closed, but not sure about the rest of engines we support (core and contrib).

  • Status changed to Closed: won't fix over 1 year ago
  • πŸ‡ͺπŸ‡ΈSpain penyaskito Seville πŸ’ƒ, Spain πŸ‡ͺπŸ‡Έ, UTC+2 πŸ‡ͺπŸ‡Ί

    After discussion in Slack with @dww and @GΓ‘bor Hojtsy, looks like there's agreement that this is a legitimate and important request, but cannot be realistically fixed unless a major re-architecture haul happens.

Production build 0.71.5 2024