- 🇯🇵Japan tyler36 Osaka
So I can confirm this issue is present in Drupal 8, 9, 10.
I came across it today when migrating user data (machineName plugin) but stuggle with it whenever I enter Japanese in a label field, which then automatically generates a machine name.
I believe it comes down to this
TransliterationInterface::transliterate($string, $langcode = 'en', $unknown_character = '?', $max_length = NULL)
and more specificallyDrupal\Component\Transliteration::replace($code, $langcode, $unknown_character)
Drupal assumes the Chinese reading of kanji even when specifing Japanese. I have confirmed this on both a English default and Japanese default site.
$this->transliteration->transliterate('雑草', 'en', '_') "zacao" $this->transliteration->transliterate('雑草', 'ja', '_') "zacao"
Likewise, katakana, which is a script only used for Japanese writing, also has problems; Eg. 'ポケットブックインライン'
Drupal 10 converts this to "hoke~tsutofu~tsukuinrain", however it should be "pokettobukkuinrain".
1. It incorrectly handles "ッ" which should double the next letter (tto, kku)
2. It also incorrectly handles diacritic marks: ポ (po) as ホ (ho) in katakana.
The diacritic problems exists in both Katakana and Hirgana; and there are 25 characters with diacritic marks in each so 50 total.
Katanana: ホ (ho), ボ (bo), ポ (po) => ho
Hiragana: ほ (ho), ぼ (bo), ぽ (po) => ho - 🇨🇦Canada joseph.olstad
@tyler36, is your database MySQL /MariaDB ? what db type? Also what collation type are you using for your Drupal database? utf8mb4_general_ci is the one that you should most likely be using. If you install Drupal with utf8mb4_general_ci , or convert your database to use utf8mb4_general_ci , this might resolve your issues. Try a fresh install of Drupal on a database using utf8mb4_general_ci
What collation type are you currently using?
- 🇯🇵Japan tyler36 Osaka
@joseph.olstad Thanks for the reply and link to parent issue.
Database is MariaDB:10.4 using "utf8mb4_general_ci" collation.