Issue with HTML ` ` not being correctly filtered out from URLs

Issue created by @yovince
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Status changed to Needs work about 1 year ago9:51am 8 May 2024
Comment about 1 year ago →
cilefen
From which Drupal version did you upgrade? Can you please identify the commit in Drupal core that changed behavior? We need that to better understand the situation. The release notes are helpful for that. If you are adept with Git, a git bisect operation will find the commit quickly.
Open in Jenkins → Open on Drupal.org →
Environment: php8.3_mysql8
last update about 1 year ago
Patch Failed to Apply
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
hey @cliefen
thanks for your response.

We upgraded from Drupal 10.1.7 to 10.2.5. The issue can be found at: https://www.drupal.org/project/drupal/issues/2441811 🐛 Upgrade filter system to HTML5 Fixed . The commit associated with this issue is `201ae2e35438b7d8f7c831ba8ac33bfc035bbb0a`. and the merge request: https://git.drupalcode.org/project/drupal/-/merge_requests/4274/diffs
Comment about 1 year ago →
cilefen
Please update your examples above. Wrap them in the <code> tag because they all look the same.
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Thanks, It should look better now
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
cilefen
Your patch does not apply. You should test on and patch the development branch.
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Ah, it can be applied to the 10.2.x branch, which is the development branch, right? I think the issue is that when I uploaded the patch file on this page, the Test with options didn't include 10.2.x. So, the test program tried to apply it to 10.1.x, which caused it to fail?
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
cilefen
You should make a pull request instead.
Comment about 1 year ago →
🇺🇸United States mfb San Francisco
So the URL filter, which can be found in the _filter_url() function in filter.module, was developed in the context of previous versions of Drupal, where HTML was serialized by calling saveXML(), thus outputting literal UTF-8 characters, since XML doesn't support most HTML entities. After HTML 5 support was added, HTML serialization results in HTML entities rather than literal UTF-8 characters.

I think the URL filter could be overhauled to support this use case? Parsing HTML with regex is generally frowned upon, but that's what this filter does. If HTML was parsed properly then text nodes could be modified, regardless of HTML entity encoding? Tweaking the serialization rules might be possible too, to partially restore previous behavior, but that seems really hacky.
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
hey @mfb, thank you for your reply! The issue is not just with URLs not being correctly filtered, but also with certain situations not being appropriately filtered. Please take a look at Image 3 and Image 4.
I have changed the title of this ticket.
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
hey @cilefen

You should make a pull request instead.

thanks, I have created a PR.
Comment about 1 year ago →
🇺🇸United States xjm
To address this issue, we should first create a merge request against 11.x. The easiest thing to do is probably to close the current MR and create a new one against 11.x. Thanks!
Merge request !8443HTML ` ` not being correctly filtered out → (Open) created by yovince
Comment about 1 year ago →
🇦🇺Australia yovince Melbourne
hi @xjm,
Thank you for your reply. I have created a merge request against the 11.x branch
Pipeline finished with Failed
about 1 year ago
Total: 680s
#201394
Comment about 1 year ago →
🇮🇳India onkararun
hi @yovince we can solve this issue by doing in this way

$hltm = html_entity_decode($text, ENT_QUOTES, 'UTF-8');

// Remove empty paragraphs, including those with non-breaking spaces.
$text = preg_replace('/
( |\s|
)*<\/p>/', '', $html);

return $text;
Please check this i hope it works.
Comment about 1 year ago →
🇺🇸United States mfb San Francisco
As can be seen in the test failures, the approach in the merge request doesn't work; you cannot simply decode HTML entities when serializing HTML - the result would be both invalid and unsafe.
Comment 12 months ago →
🇦🇺Australia yovince Melbourne
yovince → changed the visibility of the branch 3445910-issue-with-html-non-breaking-space to hidden.
Merge request !9098HTML ` ` not being correctly filtered out → (Open) created by yovince
Pipeline finished with Failed
12 months ago
Total: 153s
#245526
Pipeline finished with Failed
12 months ago
Total: 565s
#245535
Pipeline finished with Success
12 months ago
Total: 542s
#245679
Comment 12 months ago →
🇦🇺Australia yovince Melbourne
thanks @mfb for your comments.
I committed a MR that seems to have fixed my issue. I also added a patch to prevent problems if the pull request is closed or updated.
Comment 12 months ago →
🇦🇺Australia yovince Melbourne
Comment 10 months ago →
🇺🇸United States sea2709 Texas
I encounter this issue as well. I notice that in some cases, instead of putting a real space, the editor puts a  

The patch #27 🐛 Issue with HTML ` ` not being correctly filtered out from URLs Active works on my project. I'm a little bit concerned about if this is the root cause of this issue. I think the issue is from the editor more than the filtering process.
Status changed to Needs review 6 months ago5:36pm 23 January 2025
Comment 6 months ago →
🇺🇸United States trackleft2 Tucson, AZ 🇺🇸
Looks like this needs to be back in Needs Review.

@sea2709, there may be an issue in the ckeditor5 issue queue about this. https://github.com/ckeditor/ckeditor5/issues?q=sort%3Aupdated-desc+is%3A...
Comment 6 months ago →
🇺🇸United States smustgrave
So can issue summary be updated to use the standard template, specifically "Proposed solution". Also will need some test coverage please.

Issue with HTML ` ` not being correctly filtered out from URLs

Problem/Motivation

Steps to reproduce

Merge Requests

!9098Issue with HTML ` ` not being correctly filtered out from URLs
Open

!8443Issue with HTML ` ` not being correctly filtered out from URLs
Open

Comments & Activities

Issue with HTML `&nbsp;` not being correctly filtered out from URLs

Problem/Motivation

Steps to reproduce

Merge Requests

!9098Issue with HTML `&nbsp;` not being correctly filtered out from URLsOpen

!8443Issue with HTML `&nbsp;` not being correctly filtered out from URLsOpen

Comments & Activities

Issue with HTML ` ` not being correctly filtered out from URLs

!9098Issue with HTML ` ` not being correctly filtered out from URLs
Open

!8443Issue with HTML ` ` not being correctly filtered out from URLs
Open