HtmlFilter mishandling  

Created on 21 July 2025, 2 months ago

Problem/Motivation

HtmlFilter doesn't properly handle &nbsp resulting in preg_replace further down the line returning NULL due to invalid UTF-8. (note that I had to leave the ';' off these example so you can see it!)

Start with text with an \&nbsp in it - CkEditor likes to insert those.

processFieldValue() will call handleAttributes() which loads the document using Drupal HTML::load - this uses the HTML5 parser to load a dom from a string - and this translates the &nbsp into UTf-8 0xc2a0.

parseHtml is then called that calls normalizeText with the UTF-8 string. The call there to preg_replace isn't sent the '/u' modifier, so it doesn't understand the UTF-8 - and proceeds to convert THAT to the unicode REPLACEMENT character (whew!). At this point future calls to preg_replace (with the '/u' modifier) will return NULL with an error code of PREG_BAD_UTF8_ERROR. The tokenizer simplifyText() for example does this.

Steps to reproduce

Proposed resolution

In normalizeText() add the '/u' modifier to the preg_replace pattern

Remaining tasks

🐛 Bug report
Status

Active

Version

1.38

Component

Plugins

Created by

🇺🇸United States jwag956 Monterey, ca

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024