Free tagging vocabularies: Treatment of comma in Chinese and Unicode

Created on 20 September 2009, about 15 years ago
Updated 19 February 2023, over 1 year ago

Problem/Motivation

When typing in Chinese a comma is represented as ',' rather than ','. The first character is Unicode U+FF0C ‘Fullwidth Comma', the second character used in the West is U+002C. Separating terms in free tagging vocabularies currently works with U+002C but doesn't work with U+FF0C. Users typing in a Chinese language IME have little to no awareness of this and no ability to type the ',' Unicode character U+002C other than exiting the IME.

What are the steps required to reproduce the bug?

Try and separate terms in a free tagging vocabulary with U+FF0C.

What behavior were you expecting?

Terms to be separated.

What happened instead?

Terms were not separated.

Also note:

Lists in Chinese are sometimes separated with ',' U+FF0C and at other times separated by '、' Unicode U+3001 'Ideographic Comma' - depending on the type of list/list context. In a multilingual or Chinese language only site all three forms of comma should be able to separate items in a list.

See http://blog.northclick.de/archives/25 for an approach (and approaching PHP functions which do not treat UTF-8 characters as multiple bytes per character).

Steps to reproduce

  1. Install Drupal standard profile
  2. Enable Language and Content Translation
  3. Go to admin/config/regional/language, add Chinese, Traditional
  4. Go to admin/config/regional/content-language, Enable translation for Content, Article and Taxonomy Term, Tags
  5. Go to /node/add/article, add an article.
    1. Select Chinese as the language
    2. In the tags field, enter 测试, 测试1
    3. Save node
    4. Notice the two tags are saved
  6. Go to admin/structure/taxonomy/manage/tags/overview. See that 测试 and 测试1 are separate tags.
  7. Go to /node/add/article, add another article.
    1. Select Chinese as the language
    2. In the tags field, enter 测试,测试1,测试2
    3. Save node
    4. Notice the three tags are saved as one
  8. Go to admin/structure/taxonomy/manage/tags/overview. See that 测试2 has not been added as a tag, the entire string 测试,测试1,测试2 has been incorrectly added as a single tag

Proposed resolution

TBA

Remaining tasks

Patch
Review
Commit

User interface changes

API changes

Data model changes

Release notes snippet

🐛 Bug report
Status

Active

Version

9.5

Component
Taxonomy 

Last updated 5 days ago

  • Maintained by
  • 🇺🇸United States @xjm
  • 🇬🇧United Kingdom @catch
Created by

🇭🇰Hong Kong AlexBowman

Live updates comments and jobs are added and updated live.
  • Usability

    Makes Drupal easier to use. Preferred over UX, D7UX, etc.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024