- Issue created by @makbay
- 🇳🇿New Zealand quietone
If this problem was discovered on a version of Drupal that is not 11.x, add that information in the issue summary and leave the version at 11.x. In Drupal core changes are made on on 11.x (our main development branch) first, and are then back ported as needed according to the Core change policies → . Also mentioned on the version → section of the list of issue fields documentation.
- 🇨🇦Canada Charlie ChX Negyesi 🍁Canada
Is your database MySQL? If yes then what does
SELECT VERSION();
show and what doesshow full columns from node_field_data where field='title';
show in theCollation
column? - 🇹🇷Turkey makbay
@ghost of drupal past
MariaDB [db]> SELECT VERSION(); +------------------------------+ | VERSION() | +------------------------------+ | 10.11.11-MariaDB-ubu2204-log | +------------------------------+ 1 row in set (0.000 sec)
MariaDB [db]> show full columns from node_field_data where field='title'; +-------+--------------+--------------------+------+-----+---------+-------+---------------------------------+---------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-------+--------------+--------------------+------+-----+---------+-------+---------------------------------+---------+ | title | varchar(255) | utf8mb4_general_ci | NO | MUL | NULL | | select,insert,update,references | | +-------+--------------+--------------------+------+-----+---------+-------+---------------------------------+---------+ 1 row in set (0.002 sec)
- 🇨🇦Canada Charlie ChX Negyesi 🍁Canada
That took me a bit , thanks for the challenge, I enjoy them (this is not a joke). I did try with mariadb and saw no problems https://dbfiddle.uk/S1UWV8sj but when I actually tried locally (I run MySQL 8.0) I saw something interesting:
Look at the results of https://3v4l.org/0TJmt
in theory, both "İstanbul" and "istanbul" should be normalized to "istanbul" before being matched.
Once we see the PHP output disproving this theory, it becomes very easy to search for the problem and it is indeed a known problem. https://stackoverflow.com/a/42887898/308851 says
Unicode solves this problem as follows: when İ is converted to lowercase, it's actually converted to the standard latin i plus the combining character U+0307 "COMBINING DOT ABOVE".
I am not sure what Drupal could do here.
- 🇹🇷Turkey makbay
Thanks a lot for diving into this in such depth! Your explanation really helped me understand what’s going on under the hood.
The difference caused by Unicode's handling of capital "İ" and its decomposition into "i" + U+0307 is indeed something I hadn’t realized before. The link to CaseFolding.txt and the idea of using mb_convert_case(..., MB_CASE_LOWER_SIMPLE) makes a lot of sense. I wasn't aware that this flag avoids locale-specific casing rules in a more consistent way, and it sounds like a safer path forward given the intl extension constraint.
If the results from your test script look promising, I’d be happy to create a merge request for this and help move the discussion forward
Thanks again for your insight and support!
- 🇨🇦Canada Charlie ChX Negyesi 🍁Canada
I did a test first with the casefolding unicode text linked then just to be sure over all 65536 BMP Unicode codepoints and the only character where
mb_strtolower($utf8Char)
differs frommb_convert_case($utf8Char, MB_CASE_LOWER_SIMPLE)
is indeed 0130 Latin Capital Letter I with Dot Above.Recommended next steps:
- Add a strtolower method to the Unicode utility class which will be a partial revert of #2850046: Remove usages of \Drupal\Component\Utility\Unicode() functions → .
- Use mb_convert_case \MB_CASE_LOWER_SIMPLE there.
- Leave a comment this is the same as mb_strtolower except for U+0130 Latin Capital Letter I with Dot Above which this method lowercases to U+0069 Latin Small Letter I unlike mb_strtolower which lowercases it to U+0069 Latin Small Letter I followed by U+0307 Combined Dot Above.
- Change EntityAutocompleteController and Drupal\Core\Config\Entity\Query and any other mb_strtolower you can find which looks relevant with a call to Unicode::strtolower().
- Write a change record.
- Pending approval upgrade https://www.drupal.org/node/2850048 → as well.