- Issue created by @yovince
- Status changed to Needs work
8 months ago 9:51am 8 May 2024 From which Drupal version did you upgrade? Can you please identify the commit in Drupal core that changed behavior? We need that to better understand the situation. The release notes are helpful for that. If you are adept with Git, a
git bisect
operation will find the commit quickly.- last update
8 months ago Patch Failed to Apply - 🇦🇺Australia yovince Melbourne
hey @cliefen
thanks for your response.We upgraded from Drupal 10.1.7 to 10.2.5. The issue can be found at: https://www.drupal.org/project/drupal/issues/2441811 🐛 Upgrade filter system to HTML5 Fixed . The commit associated with this issue is `201ae2e35438b7d8f7c831ba8ac33bfc035bbb0a`. and the merge request: https://git.drupalcode.org/project/drupal/-/merge_requests/4274/diffs
Please update your examples above. Wrap them in the
<code>
tag because they all look the same.Your patch does not apply. You should test on and patch the development branch.
- 🇦🇺Australia yovince Melbourne
Ah, it can be applied to the
10.2.x
branch, which is the development branch, right? I think the issue is that when I uploaded the patch file on this page, the Test with options didn't include10.2.x
. So, the test program tried to apply it to10.1.x
, which caused it to fail? - 🇺🇸United States mfb San Francisco
So the URL filter, which can be found in the _filter_url() function in filter.module, was developed in the context of previous versions of Drupal, where HTML was serialized by calling saveXML(), thus outputting literal UTF-8 characters, since XML doesn't support most HTML entities. After HTML 5 support was added, HTML serialization results in HTML entities rather than literal UTF-8 characters.
I think the URL filter could be overhauled to support this use case? Parsing HTML with regex is generally frowned upon, but that's what this filter does. If HTML was parsed properly then text nodes could be modified, regardless of HTML entity encoding? Tweaking the serialization rules might be possible too, to partially restore previous behavior, but that seems really hacky.
- 🇦🇺Australia yovince Melbourne
hey @mfb, thank you for your reply! The issue is not just with URLs not being correctly filtered, but also with certain situations not being appropriately filtered. Please take a look at Image 3 and Image 4.
I have changed the title of this ticket. - 🇦🇺Australia yovince Melbourne
hey @cilefen
You should make a pull request instead.
thanks, I have created a PR.
- 🇺🇸United States xjm
To address this issue, we should first create a merge request against 11.x. The easiest thing to do is probably to close the current MR and create a new one against 11.x. Thanks!
- 🇦🇺Australia yovince Melbourne
hi @xjm,
Thank you for your reply. I have created a merge request against the 11.x branch - 🇮🇳India onkararun
hi @yovince we can solve this issue by doing in this way
$hltm = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
// Remove empty paragraphs, including those with non-breaking spaces.
$text = preg_replace('/( |\s|
)*<\/p>/', '', $html);return $text;
Please check this i hope it works. - 🇺🇸United States mfb San Francisco
As can be seen in the test failures, the approach in the merge request doesn't work; you cannot simply decode HTML entities when serializing HTML - the result would be both invalid and unsafe.
- 🇦🇺Australia yovince Melbourne
thanks @mfb for your comments.
I committed a MR that seems to have fixed my issue. I also added a patch to prevent problems if the pull request is closed or updated. - 🇺🇸United States sea2709 Texas
I encounter this issue as well. I notice that in some cases, instead of putting a real space, the editor puts a
The patch #27 🐛 Issue with HTML ` ` not being correctly filtered out from URLs Needs work works on my project. I'm a little bit concerned about if this is the root cause of this issue. I think the issue is from the editor more than the filtering process.