HtmlFilter mishandling  

Created on 21 July 2025, 13 days ago

Problem/Motivation

HtmlFilter doesn't properly handle &nbsp resulting in preg_replace further down the line returning NULL due to invalid UTF-8. (note that I had to leave the ';' off these example so you can see it!)

Start with text with an \&nbsp in it - CkEditor likes to insert those.

processFieldValue() will call handleAttributes() which loads the document using Drupal HTML::load - this uses the HTML5 parser to load a dom from a string - and this translates the &nbsp into UTf-8 0xc2a0.

parseHtml is then called that calls normalizeText with the UTF-8 string. The call there to preg_replace isn't sent the '/u' modifier, so it doesn't understand the UTF-8 - and proceeds to convert THAT to the unicode REPLACEMENT character (whew!). At this point future calls to preg_replace (with the '/u' modifier) will return NULL with an error code of PREG_BAD_UTF8_ERROR. The tokenizer simplifyText() for example does this.

Steps to reproduce

Proposed resolution

In normalizeText() add the '/u' modifier to the preg_replace pattern

Remaining tasks

πŸ› Bug report
Status

Active

Version

1.38

Component

Plugins

Created by

πŸ‡ΊπŸ‡ΈUnited States jwag956 Monterey, ca

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024