Autocomplete widget fails to return results for Turkish capital letter "İ"

Created on 25 May 2025, 12 days ago

Problem/Motivation

The Entity Reference field's Autocomplete widget fails to return results when the search query contains the uppercase Turkish "İ" (U+0130).
But if you do the same query with lowercase i, it returns the expected results.

I have tried to find the root cause by debugging the AJAX requests and saw that all the strings are converted to lowercase by with the function mb_strtolower.

Despite both search queries being lowercased before comparison, results differ depending on whether the original query contains the uppercase Turkish "İ" (U+0130) or a regular lowercase "i". This is especially strange because internally, the system uses mb_strtolower($input) to normalize the search term — so in theory, both "İstanbul" and "istanbul" should be normalized to "istanbul" before being matched.

Steps to reproduce

  1. Create a node/taxonomy term titled "İstanbul" (with uppercase dotted İ).
  2. Create a node/taxonomy term titled "Kadın İstihdamı" (with uppercase dotted İ).
  3. Create a content type with an Entity Reference field (using the Autocomplete widget) targeting the node/terms in the previous steps.
  4. On the form, start typing:
  • istanbul → ✅ Suggestion appears (correct)
  • İstanbul → ❌ No suggestions returned (incorrect)
  • istihdam → ✅ Suggestion appears (correct)
  • İstihdam → ❌ No suggestions returned (incorrect)

Proposed resolution

Typing İstanbul (with capital Turkish İ) should match a node/term titled İstanbul.

🐛 Bug report
Status

Active

Version

11.2 🔥

Component

plugin system

Created by

🇹🇷Turkey makbay

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @makbay
  • 🇳🇿New Zealand quietone

    If this problem was discovered on a version of Drupal that is not 11.x, add that information in the issue summary and leave the version at 11.x. In Drupal core changes are made on on 11.x (our main development branch) first, and are then back ported as needed according to the Core change policies . Also mentioned on the version section of the list of issue fields documentation.

  • 🇨🇦Canada Charlie ChX Negyesi 🍁Canada

    Is your database MySQL? If yes then what does SELECT VERSION(); show and what does show full columns from node_field_data where field='title'; show in the Collation column?

  • 🇹🇷Turkey makbay

    @ghost of drupal past

    MariaDB [db]> SELECT VERSION();
    +------------------------------+
    | VERSION()                    |
    +------------------------------+
    | 10.11.11-MariaDB-ubu2204-log |
    +------------------------------+
    1 row in set (0.000 sec)
    
    MariaDB [db]> show full columns from node_field_data where field='title';
    +-------+--------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
    | Field | Type         | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
    +-------+--------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
    | title | varchar(255) | utf8mb4_general_ci | NO   | MUL | NULL    |       | select,insert,update,references |         |
    +-------+--------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
    1 row in set (0.002 sec)
    
  • 🇨🇦Canada Charlie ChX Negyesi 🍁Canada

    That took me a bit , thanks for the challenge, I enjoy them (this is not a joke). I did try with mariadb and saw no problems https://dbfiddle.uk/S1UWV8sj but when I actually tried locally (I run MySQL 8.0) I saw something interesting:

    Look at the results of https://3v4l.org/0TJmt

    in theory, both "İstanbul" and "istanbul" should be normalized to "istanbul" before being matched.

    Once we see the PHP output disproving this theory, it becomes very easy to search for the problem and it is indeed a known problem. https://stackoverflow.com/a/42887898/308851 says

    Unicode solves this problem as follows: when İ is converted to lowercase, it's actually converted to the standard latin i plus the combining character U+0307 "COMBINING DOT ABOVE".

    I am not sure what Drupal could do here.

  • 🇹🇷Turkey makbay

    Thanks a lot for diving into this in such depth! Your explanation really helped me understand what’s going on under the hood.

    The difference caused by Unicode's handling of capital "İ" and its decomposition into "i" + U+0307 is indeed something I hadn’t realized before. The link to CaseFolding.txt and the idea of using mb_convert_case(..., MB_CASE_LOWER_SIMPLE) makes a lot of sense. I wasn't aware that this flag avoids locale-specific casing rules in a more consistent way, and it sounds like a safer path forward given the intl extension constraint.

    If the results from your test script look promising, I’d be happy to create a merge request for this and help move the discussion forward

    Thanks again for your insight and support!

  • 🇨🇦Canada Charlie ChX Negyesi 🍁Canada

    I did a test first with the casefolding unicode text linked then just to be sure over all 65536 BMP Unicode codepoints and the only character where mb_strtolower($utf8Char) differs from mb_convert_case($utf8Char, MB_CASE_LOWER_SIMPLE) is indeed 0130 Latin Capital Letter I with Dot Above.

    Recommended next steps:

    1. Add a strtolower method to the Unicode utility class which will be a partial revert of #2850046: Remove usages of \Drupal\Component\Utility\Unicode() functions .
    2. Use mb_convert_case \MB_CASE_LOWER_SIMPLE there.
    3. Leave a comment this is the same as mb_strtolower except for U+0130 Latin Capital Letter I with Dot Above which this method lowercases to U+0069 Latin Small Letter I unlike mb_strtolower which lowercases it to U+0069 Latin Small Letter I followed by U+0307 Combined Dot Above.
    4. Change EntityAutocompleteController and Drupal\Core\Config\Entity\Query and any other mb_strtolower you can find which looks relevant with a call to Unicode::strtolower().
    5. Write a change record.
    6. Pending approval upgrade https://www.drupal.org/node/2850048 as well.
Production build 0.71.5 2024