Paragraph tags are not stripped if the p tag has attributes like dir=ltr or title

Issue created by @mlncn

mlncn → committed 243e2f32 on 1.x

Issue #3410145 by mlncn: Paragraph tags are not stripped if the p tag...

Status changed to Needs review over 1 year ago2:14am 21 December 2023
Comment over 1 year ago →
🇺🇸United States mlncn Minneapolis, MN, USA
Status changed to Needs work over 1 year ago11:58am 14 February 2024
Comment over 1 year ago →
🇫🇷France duaelfr Montpellier, France
Hi! Thank you for this module!

Sadly, this change breaks utf8 support.
For example: <p>légende</p> is converted to lÃ©gende which is then displayed as lÃ©gende.

Looking for a fix, I've come to 3 options:

use utf8_decode() on the string passed to the loadHTML() method
how:
$dom = new DOMDocument; $dom->loadHTML(utf8_decode($text));
pros: one line fix
cons: might mess with some specific characters (not sure), possible issue if the source string is not using utf8 (is it possible in Drupal?)

use the loadXML() method instead of the loadHTML() one
how:
$dom = new DOMDocument; $dom->loadXML($text);
pros: one line fix
cons: could break if the given HTML is not perfect (ie: unclosed tag), could be mitigated by running this filter after the filter_htmlcorrector filter from core but that would be in the site builder hands

encapsulate the string into a minimal HTML structure before passing it to the loadHTML() method
how:
$dom = new DOMDocument; $charset = mb_detect_encoding($text); $html = "<!DOCTYPE html><html><head><meta charset='$charset'></head><body>$text</body></html>"; $dom->loadHTML($html);
pros: workaround the cons of other options
cons: looks a bit hackish
Merge request !1Issue #3410145: wrap the string in a minimal HTML structure to prevent encoding issues. → (Merged) created by duaelfr
Status changed to Needs review over 1 year ago12:03pm 14 February 2024
Comment over 1 year ago →
🇫🇷France duaelfr Montpellier, France
I just opened the !1 MR with option 3 from my previous comment.
Comment over 1 year ago →
🇫🇷France duaelfr Montpellier, France
Fixed stupid mistake in the MR (I wasn't using the forged html string in the loadHTML method...)
Pipeline finished with Skipped
over 1 year ago
#106454
Comment over 1 year ago →
System Message

mlncn → committed 221c2b32 on 1.x authored by DuaelFr →
Issue #3410145 by mlncn: Paragraph tags are not stripped if the p tag...
Status changed to RTBC over 1 year ago12:41am 29 February 2024
Comment over 1 year ago →
🇺🇸United States mlncn Minneapolis, MN, USA
Should we document that if there is a mix of text in paragraph tags and not in paragraphs tags, the content that is not in paragraph tags will be lost? (That's how it's going to work here, correct?)

Also DuaelFr, would you be willing to be a co-maintainer for this module?
Comment over 1 year ago →
🇺🇸United States ksenzee Washington state
I'd say it needs documenting, because it's a change that makes this module impossible for me to use. I was hoping to use it as a workaround for a TMGMT issue where my text sometimes is saved with
wraparounds and sometimes not, and this means it won't work in that situation. Obviously that's not the fault of this module, and the right fix in my situation is to get TMGMT to quit taking formatted fields with textfield widgets and presenting them as textareas with CKEditor, but it would have saved me some time to know that a mix of
and no
is unsupported.
Status changed to Fixed over 1 year ago1:19pm 14 March 2024
Comment over 1 year ago →
🇫🇷France duaelfr Montpellier, France
This has been committed so the issue should be marked as "Fixed".
I faced a new issue so I opened a follow-up 🐛 Warnings when using this on HTML5 markup Active to continue improving the module. We might want to write tests at some point.

@mlncn I don't have much time to spend on maintainership but I can jump in if you need.

Documentation improvements might be discussed in another issue: 📌 Improve documentation Active
Comment over 1 year ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Paragraph tags are not stripped if the p tag has attributes like dir=ltr or title

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!1Paragraph tags are not stripped if the p tag has attributes like dir=ltr or title
Merged

Comments & Activities

Paragraph tags are not stripped if the p tag has attributes like dir=ltr or title

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!1Paragraph tags are not stripped if the p tag has attributes like dir=ltr or titleMerged

Comments & Activities

!1Paragraph tags are not stripped if the p tag has attributes like dir=ltr or title
Merged