Per-site preference for non-Latin characters in anchor ids

Created on 19 March 2023, over 1 year ago
Updated 24 November 2023, 7 months ago

Problem/Motivation

Characters with Mฤori macrons are stripped from ids, so heading `Tฤ mฤtou` becomes `t-mtou`. It would be best to allow the macrons through, but since convertStringToId() uses regex \w to strip all non-Latin characters, it would require ToC API to offer a per-site preference for convertStringToId().

Steps to reproduce

Create content with heading <h2>Tฤ mฤtou</h2>. Apply ToC filter. Observe generated anchor id is t-mtou.

Proposed resolution

Refactor convertStringToId() to enable a per-site preference to allow specific non-latin characters to remain. This would likely entail revising the use of regex \w. A less desirable work-around is to map characters with macrons to Latin characters, but this can change the meaning of a word so is not ideal. A patch for the work-around is offered below.

Remaining tasks

Implement site preference described above.

User interface changes

Add settings form for per-site preference.

API changes

Unsure.

Data model changes

Would need to save site preference in config.

โœจ Feature request
Status

Fixed

Version

1.0

Component

Code

Created by

๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand jonathan_hunt

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.69.0 2024