- 🇯🇵Japan tyler36 Osaka
Experience issue and using MariaDB 10.4 with utf8mb4_general_ci.
Getting hit with issue when Drupal generates machine names which do not resemble the Japanese language labels.
Able to reproduce withdrush repl
so probably not limited to database in scope.Results of some recent testing are here 🐛 Japanese Characters replaced by Chinese english words in URL aliases. Postponed: needs info .
- 🇨🇭Switzerland phma Basel, CH
This won't help if you only want ASCII letters in your URLs. But if you prefer cleaner URLs and keep transliteration working for non-Japanese URLs, you can use something like this (it's important to turn off transliteration in settings when using this):
/** * Deal with Japanese and other non-Latin characters in Pathauto aliases. * * Unicode ranges taken from here and converted to PHP: * @see https://gist.github.com/ryanmcgrath/982242 * * Implements hook_pathauto_alias_alter(). */ function pathauto_cjk_pathauto_alias_alter(&$alias, array &$context) { // If the alias contains CJK characters, clean up punctuation but do not // transliterate because Japanese gets transliterated as Chinese. // @see https://www.drupal.org/project/drupal/issues/2984977 if (preg_match('/[\x{3000}-\x{303F}]|[\x{3040}-\x{309F}]|[\x{30A0}-\x{30FF}]|[\x{FF00}-\x{FFEF}]|[\x{4E00}-\x{9FAF}]|[\x{2605}-\x{2606}]|[\x{2190}-\x{2195}]|\x{203B}/u', $alias)) { // Cleanup fullwidth characters. $alias = mb_convert_kana($alias, 'KVrn'); // Replace punctuation with hyphens. $alias = preg_replace('/[\x{3000}-\x{303F}]|[\x{2605}-\x{2606}]|[\x{2190}-\x{2195}]|[\x{203B}\x{30FB}]|[ \t]/u', '-', $alias); // Replace remaining special characters with hyphens. $alias = preg_replace('/[^\p{Han}\p{Katakana}\p{Hiragana}\p{Latin}\d\/]+/u', '-', $alias); } else { // Transliterate the alias. $alias = \Drupal::transliteration()->transliterate($alias, $context['language'] ?? 'en'); } $alias = \Drupal::service('pathauto.alias_cleaner')->cleanAlias($alias, $context['source'], $context['language']); }
This might need more polishing and testing, so any suggestions and improvements are welcome.