Country list is not correctly sorted when it's localized with accents (e.g. German, Turkish)

Created on 2 February 2022, almost 3 years ago
Updated 7 August 2024, 6 months ago

Problem/Motivation

Country list is not correctly sorted when it's localized with accents. Especially if the first letter has an non ASCII letter they are sort to the end.
e.g. this Turkish country list:

Almanya
Birleşik Krallık(İngiltere) 
Fransa
Portekiz
Türkiye
Yunanistan
İspanya
İsviçre
İtalya

should be:

Almanya
Birleşik Krallık(İngiltere) 
Fransa
İspanya
İsviçre
İtalya
Portekiz
Türkiye
Yunanistan

e.g. this German country list:

Belgien
Frankreich
Schweden
<strong>Österreich</strong>

should be:

Belgien
Frankreich
<strong>Österreich</strong>
Schweden

Steps to reproduce

When I set my default language to german
Go to de/admin/config/regional/settings See that Österreich appears at the bottom

Proposed resolution

natcasesort in CountryManager should be replaced with an uasort with Drupal\Component\Transliteration\PhpTransliteration

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

🐛 Bug report
Status

Postponed: needs info

Version

11.0 🔥

Component
Locale 

Last updated about 1 month ago

Created by

🇩🇪Germany sleitner

Live updates comments and jobs are added and updated live.
  • Usability

    Makes Drupal easier to use. Preferred over UX, D7UX, etc.

  • Needs release note

    The major change should have a special release note written to summarize the importance of the change. See Write a release note for an issue.

  • Needs framework manager review

    It is used to alert the framework manager core committer(s) that an issue significantly impacts (or has the potential to impact) multiple subsystems or represents a significant change or addition in architecture or public APIs, and their signoff is needed (see the governance policy draft for more information). If an issue significantly impacts only one subsystem, use Needs subsystem maintainer review instead, and make sure the issue component is set to the correct subsystem.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Status changed to Needs review about 2 years ago
  • The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • 🇩🇪Germany sleitner

    I'm quite sure that PHPstan simply does not understand the made changes.

    84 Function t invoked with 3 parameters, 1 required.
    The third parameter is needed to specify the requested language.

  • Status changed to Needs work almost 2 years ago
  • The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • Status changed to Needs review almost 2 years ago
  • 🇫🇷France nod_ Lille

    Same issue with the bot as before, excluding this issue from the list

  • Status changed to RTBC almost 2 years ago
  • 🇺🇸United States smustgrave

    From what I can see all the points have been addressed by @sleitner so lets try this again.

    • longwave committed c34b922a on 10.1.x
      Issue #3262017 by sleitner, ravi.shankar, smustgrave, xjm, longwave,...
  • Status changed to Fixed almost 2 years ago
  • 🇬🇧United Kingdom longwave UK

    Committed and pushed to 10.1.x, thanks! Also published the change record.

  • 🇷🇺Russia Chi

    Added follow-up.

  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States tr Cascadia

    I have to say that this commit is quite disruptive to any modules that use, enhance, or decorate the country_manager service - this is most certainly not a "minor" change. Specifically it changes the API of the service in a way that's not backwards compatible.

    In addition to the break described in 📌 Add tests for hook_countries_alter() Fixed , this patch neglected to change CountryManagerInterface.

    The core country_manager service is provided by the CountryManager class.

    CountryManager implements CountryManagerInterface.

    This patch changes the signature of one of the implemented methods without actually modifying the CountryManagerInterface. This is wrong, so I've reopened this issue.

    Additionally, if CountryManager::getStandardList() is to be a primary method on the service, it should be described in the interface (it's not currently).

    The change record is also not very good, because it doesn't mention the service or the class that is being modified, and refers to the class as a "form", which it is not. I have done a quick edit of that change record so that searches can more easily find this change to the country_manager service, but the change record should be properly reviewed to ensure consistency with the changes here.

    May I ask why the order of the countries in English changed? What is the reason for this?

    And why are we involving transliteration here? The sorting of countries in any specific language does not require transliteration to a different character set - the sorting should be done in that language's character set.

    • longwave committed 5971ebdf on 10.1.x
      Revert "Issue #3262017 by sleitner, ravi.shankar, smustgrave, xjm,...
  • 🇬🇧United Kingdom longwave UK

    Rolled this back for now, to handle the change of values in the hook and the concerns raised in #80.

  • 🇫🇷France andypost

    Thank you! Makes sense to incorporate test and fix from 📌 Add tests for hook_countries_alter() Fixed and close it

  • Status changed to Needs review almost 2 years ago
  • 🇩🇪Germany sleitner

    CountryManagerInterface and the changes of 📌 Add tests for hook_countries_alter() Fixed are included in the patch. I moved the testCountryManagerSort test to the kernel tests.

    May I ask why the order of the countries in English changed?

    The order did not change, only the parameter $translation_options was added to each line.

    And why are we involving transliteration here?

    ok, it solved the problem for most languages, but not all. Sorting it by the language' character set makes it even more disruptive by using ext-intl Collator.

  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States tr Cascadia

    The order did not change, only the parameter $translation_options was added to each line.

    While the order of the countries in the array declaration inside getStandardList() seems to be the same, that's not important, because the declaration order is prior to the sort.

    What is important is that the order of the countries returned by getList() DID change, because it is sorted differently after the patch. And that is, of course, one of the main things the patch was intended to do! I know this because my test cases broke, and that led me to this issue.

    You can see this change in order by running the following script before and after the patch:
    drush php-eval '$array = \Drupal::service("country_manager")->getList(); print_r(array_map(function ($country) { return (string) $country; }, $array));'

    Specifically, look at [AX] => Åland Islands, which shows up at the bottom of the list (element 257, numbered 0-257) before the patch and shows up as number two in the list (element 1, numbered 0-257) after the patch. There seem to be other changes as well, but as I said scanning through 258 countries to see exactly what changed is a pain.

    When transliteration is done, Å -> A, so Åland Islands gets sorted and put somewhere in the middle of all the countries that begin with "A". But this is not correct. "Å" is a distinct letter, so all the countries that begin with "Å" should be grouped together, and not interleaved with all the countries that begin with "A". I would also expect all these "Å" countries to appear in the list BEFORE all the "A" countries, as that is the accepted way (in English) to alphabetize words that begin with numerals, special characters, etc.

    Now this is not that apparent with "Å" countries in English, because there is only one of them. But the same situation happens with other languages. To use your example, "Ö" in German is a different letter than "O", so they should be listed in separate groups and not interleaved together like they would be with transliteration involved.

    I don't think transliteration is the right approach, because transliterating loses the information that these non-native characters are distinct characters.

    There are many authorities on how to alphabetize - in English for example I would refer to the Chicago Manual of Style. What this patch seems to do is to invent a brand-new "Drupal" style of alphabetizing, which seems wrong. We are already inventing and maintaining our own list of countries that does not 100% agree with any standard, why are we working so hard to invent a new solution to alphabetizing as well?

    To implement this new feature of being able to obtain the country list in a specific language, I would suggest that, instead of adding an argument to getList(), a NEW method should be declared for getting the list. Something like CountryManagerInterface::getLocalizedList($language) where $language is a language ID. This would still involve injecting the language_manager service, though. Off the top of my head I don't know if that is an allowable change for a minor-point release, but I do know if this is done it will make my D10.0 module incompatible with D10.1.

    And to bring up another point, hooks are not how we alter services in Drupal. Symfony has perfectly good mechanisms for altering services, which we document in Drupal and use elsewhere. This hook_countries_alter() hook seems to be an artifact of the early porting days when this code was initially moved from D7 procedural code, and it was never touched after that even after everyone became more familiar with services and how we wanted to implement them in Drupal. There is no need to define a hook in order to alter this service. And indeed, since the altering function is not described in CountryManagerInterface (see #2487351: Not inheriting parent documentation for CountryManager::getList() ) I will argue that this alter hook is not part of the public country_manager service API and should be deprecated here and dropped for D11. In any case, the hook is not part of the current API and if I override the service in the standard manner I will lose the hook functionality - having two standard ways to alter services is prone to error.

  • 🇬🇧United Kingdom longwave UK

    I think I agree that hook_countries_alter() should be deprecated, but let's do that in a separate issue, rather than expand the scope of this one.

  • Status changed to Needs review almost 2 years ago
  • 🇩🇪Germany sleitner

    @TR I don't want to reinvent the wheel. My intention is to sort lists like the country list and e.g. the content types list in the same way as the content/node list on a multiligual website in Drupal. The content list sorting in Drupal depends on the database, which usually uses a general collation e.g. utf8mb4_general_ci in MySQL . This results in a sorted list of titles (country names): Afghanistan, Åland Islands, Albania, Oman, Österreich, Osttimor. The non-ascii characters are not send to the end of the list.

    Specifically, look at [AX] => Åland Islands

    Åland is Swedish/Finnish. The Swedish and Finnish rule is to put "Å" after "Z". "Å" is also use in Danish and Norwegian, but the order of characters after "Z" is not the same as in Swedish and Finnish, and "Aa" is equivalent to "Å".

    To use your example, "Ö" in German is a different letter than "O", so they should be listed in separate groups and not interleaved together like they would be with transliteration involved.

    There are two different rules in German-German how to sort a list, depending on if it is a list of names in phone books or words in a dictionary. If its a phone books "Ö" is treated as "Oe" (DIN 5007-2). If it is a dictionary or other list "Ö" is treated as "O" (DIN 5007-1). (see https://de.wikipedia.org/wiki/Alphabetische_Sortierung#Deutschland)
    In Austrian-German phone books these rules are used: name lists "Ä" would be after "Az" (Österreichische Sortierung), city name lists "Ä" after "Z", other lists: "Ä" is treaded as "A" (DIN 5007-1). (see https://de.wikipedia.org/wiki/Alphabetische_Sortierung#%C3%96sterreich).
    For Swiss-German I did not find special rules so quickly.

    My new proposal in #84 uses the Collator class of the PHP intl extension (https://www.php.net/manual/en/class.collator.php), which uses the ICU library (https://icu.unicode.org/) . If you use Swedish as language, Åland is at the end of the list. With English and German it is between "A" countries. In French it is "Îles Aland".

  • Status changed to Needs work almost 2 years ago
  • 🇩🇪Germany sleitner

    #88 the tests are missing

  • 🇬🇧United Kingdom longwave UK

    @sleitner Thank you for working on this. At present Drupal doesn't require the intl extension and I don't think we can add that requirement here. It might be possible to add a dependency on symfony/polyfill-intl-icu instead, which looks to provide the Collator class for environments that do not have the intl extension - does your code work with that? But this is still quite a big change and will require wider review.

  • 🇩🇪Germany sleitner

    @longwave Thanks, now I know why I do not find a Collator in current Symfony, because it is hidden in the polyfills.
    I will take a look at it tomorrow. Possibly we do not need language manager to make it a bit easier.

  • 🇩🇪Germany sleitner

    @longwave symfony/polyfill-intl-icu does not helps us, because it is limited to the "en" locale. Any suggestions?

  • Status changed to Needs review almost 2 years ago
  • 🇩🇪Germany sleitner

    Please review this patch which uses PHP intl extension for Collator class, because symfony/polyfill-intl-icu is en locale/english only. PHP intl extension would help sorting other lists in Drupal (e.g. ConfigEntity based lists) as well.

    Language Manager is not used anymore, because the language id parameter is mandatory for the new getLocalizedList method. The position of these countries changed in getLocalizedList compared to getList: Åland Islands, Côte d’Ivoire, Northern Mariana Islands, Réunion, São Tomé & Príncipe

    The getList method works in the original way, no country position changed. The hook_countries_alter() is called only in getList, because an additional parameter would be needed for the language id in getLocalizedList .

  • Status changed to Needs work almost 2 years ago
  • Status changed to Needs review almost 2 years ago
  • 🇩🇪Germany sleitner

    Please review this patch which uses symfony/polyfill-intl-icu. If php intl is not installed en locale/english is used only.

    Language Manager is not used anymore, because the language id parameter is mandatory for the new getLocalizedList method. The position of these countries changed in getLocalizedList compared to getList: Åland Islands, Côte d’Ivoire, Northern Mariana Islands, Réunion, São Tomé & Príncipe

    The getList method works in the original way, no country position changed. The hook_countries_alter() is called only in getList, because an additional parameter would be needed for the language id in getLocalizedList .

  • Status changed to Needs work almost 2 years ago
  • 🇬🇧United Kingdom longwave UK
    1. +++ b/composer.json
      @@ -12,7 +12,8 @@
      +        "symfony/polyfill-intl-icu": "^1.27"
      

      This doesn't need to be added here, only in core/composer.json.

    2. +++ b/core/tests/Drupal/FunctionalTests/Core/Locale/CountryManagerTest.php
      @@ -0,0 +1,76 @@
      +    $this->assertEquals('ES', array_search(
      +    // cSpell:disable-next-line
      +      'Espagne',
      +      $countryList));
      

      This is hard to read, this should probably just be one line.

    3. +++ b/core/tests/Drupal/FunctionalTests/Core/Locale/CountryManagerTest.php
      @@ -0,0 +1,76 @@
      +    $this->assertEquals('ES', array_search(
      +      'Spain',
      +      $countryList));
      

      Same

    4. +++ b/core/tests/Drupal/KernelTests/Core/Locale/CountryManagerTest.php
      @@ -0,0 +1,47 @@
      +    $this->assertTrue(array_search('AX', $countryIds)
      +      < array_search('ZW', $countryIds));
      

      Can use assertLessThan here.

    @TR does this latest patch fix all the concerns you had with the previous version?

  • 🇬🇧United Kingdom longwave UK

    This also needs a release note given that we are adding a new dependency and that Intl extends this functionality further.

    I also wonder if we should add the Intl extension to the suggest section of core/composer.json?

  • Status changed to Needs review almost 2 years ago
  • 🇩🇪Germany sleitner

    @longwave change record is updated.

    To prevent MethodArgumentValueNotImplemented exception in Collator, language_id en is used if intl extension is not installed.

  • 🇬🇧United Kingdom longwave UK

    Tagging for framework manager review, as we are adding a new soft dependency on the intl extension and we need an opinion as to whether this is the right direction.

    Also tagging for release manager review; I don't see any issue with the new dependencies but would like a second opinion there too.

  • 🇬🇧United Kingdom catch
    +++ b/core/lib/Drupal/Core/Locale/CountryManager.php
    @@ -325,4 +342,28 @@ public function getList() {
    +   */
    +  public function getLocalizedList($language_id) {
    +    // Populate the country list if it is not already populated.
    +    if (!isset($this->localizedCountries[$language_id])) {
    +      $this->localizedCountries[$language_id] = static::getStandardList($language_id);
    +
    +      // Sorts the country array depending on language rules.
    +      $collator = \Collator::create((!extension_loaded('intl')) ? ('en') : ($language_id));
    +      $collator->asort($this->localizedCountries[$language_id]);
    +    }
    +
    +    return $this->localizedCountries[$language_id];
    +  }
    

    Do we need a hook_requirements() somewhere encouraging people to install the intl extension (maybe in locale module?).

    I was also wondering if we could just skip the sorting completely if the locale is 'en' and not add the symfony polyfill in that case, but it's probably cleaner to have less code paths.

  • 🇬🇧United Kingdom catch

    Untagging for release manager review since both me and longwave think this is OK in principal.

    One more thought - should locale module decorate the country manager and handle the sorting? Probably not worth it though for the complexity that would add.

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,375 pass, 1 fail
  • 🇩🇪Germany sleitner

    Added a warning and an ok message to hook_requirements() in /core/modules/locale/locale.install

  • Status changed to Needs work over 1 year ago
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,375 pass, 1 fail
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,376 pass
  • 🇬🇧United Kingdom longwave UK
    1. +++ b/core/composer.json
      @@ -112,6 +113,7 @@
      +        "ext-intl": "Needed to extend symfony/polyfill-intl-icu capability with the sorting of non-english languages.",
      

      "English" should be capitalised.

    2. +++ b/core/lib/Drupal/Core/Locale/CountryManager.php
      @@ -18,9 +18,18 @@ class CountryManager implements CountryManagerInterface {
      +  protected $localizedCountries = [];
      

      We can use typehints for new properties.

    3. +++ b/core/lib/Drupal/Core/Locale/CountryManager.php
      @@ -34,270 +43,278 @@ public function __construct(ModuleHandlerInterface $module_handler) {
      +  public static function getStandardList($language_id = NULL) {
      

      What if someone has already extended this class and replaced this method? https://3v4l.org/0WTuR

    4. +++ b/core/lib/Drupal/Core/Locale/CountryManager.php
      @@ -325,4 +342,28 @@ public function getList() {
      +  public function getLocalizedList($language_id) {
      

      We can use a typed argument and a return type here.

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,315 pass, 2 fail
  • 🇩🇪Germany sleitner

    Lots of changed lines, but now the old methods remain.

  • Status changed to Needs work over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,376 pass
  • Status changed to Needs review over 1 year ago
  • 🇩🇪Germany sleitner

    @longwave all four #116 issues had been fixed. Now the old methods remain intact.

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States smustgrave

    Going to go ahead and mark this to keep it moving forward. @catch as a release manager did a review #109 but he's also a framework manager. Will leave the tag if we think a separate framework manager is needed but think we should be good.

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    Patch Failed to Apply
  • Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    Waiting for branch to pass
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,390 pass
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍
    1. Rerolled due to composer changes.
    2. Fixed the version constraint to be inline with other constraints in our composer. Did this as part of resolving the conflicts. Changed ~v1.27.0 to ^1.27 as the tilde constraint on a patch release is not what we want.
    3. Improved the docs to not duplicate the between the class and interface and more detail about what is in the array

    None of these changes affect the logic of the change therefore leaving at RTBC.

  • Status changed to Needs work over 1 year ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think there are some tensions with the existing getList() method...

      public function getList() {
        // Populate the country list if it is not already populated.
        if (!isset($this->countries)) {
          $this->countries = static::getStandardList();
          $this->moduleHandler->alter('countries', $this->countries);
        }
    
        return $this->countries;
      }
    

    It has an alter hook...

    and the new method...

      /**
       * {@inheritdoc}
       */
      public function getLocalizedList(string $language_id) : array {
        // Populate the country list if it is not already populated.
        if (!isset($this->localizedCountries[$language_id])) {
          $this->localizedCountries[$language_id] = static::getCountryList($language_id);
    
          // Sorts the country array depending on language rules.
          $collator = \Collator::create((!extension_loaded('intl')) ? ('en') : ($language_id));
          $collator->asort($this->localizedCountries[$language_id]);
        }
    
        return $this->localizedCountries[$language_id];
      }
    

    So if you alter the first one you don't get to alter the other. I don;'t think the best solution is to add another alter hook. I think the best solution would be to able to execute code in a different language contexts (there is an issue for this but I'm having trouble to find it) - so we can call getList() without having to add a hook or do anything particularly special.

    +++ b/core/tests/Drupal/FunctionalTests/Core/Locale/CountryManagerTest.php
    @@ -0,0 +1,72 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  protected $profile = 'minimal';
    

    This is necessary so that we get translations from localise.drupal.org - but that makes the test fragile. We should be setting up a few translations in the test rather than doing this.

  • 🇩🇪Germany sleitner

    @alexpott are you searching for this issue 📌 Allow to change the current language Needs work ?

  • 🇩🇪Germany sleitner

    It seems to me that we have to delegate this to the custom modules as long as we do not want to break anything.

    In the country module Sort country names using Collator Fixed the country lists are now sorted correctly. It does not cover the default country settings in the core regional settings.

  • Status changed to Postponed 11 months ago
  • Status changed to Postponed: needs info 6 months ago
  • 🇳🇿New Zealand quietone

    What is this issue posptoned on?

Production build 0.71.5 2024