- Issue created by @mlncn
- Status changed to Needs review
11 months ago 2:14am 21 December 2023 - Status changed to Needs work
9 months ago 11:58am 14 February 2024 - π«π·France duaelfr Montpellier, France
Hi! Thank you for this module!
Sadly, this change breaks utf8 support.
For example:<p>lΓ©gende</p>
is converted tolégende
which is then displayed aslΓΒ©gende
.Looking for a fix, I've come to 3 options:
- use
utf8_decode()
on the string passed to theloadHTML()
method
how:$dom = new DOMDocument; $dom->loadHTML(utf8_decode($text));
pros: one line fix
cons: might mess with some specific characters (not sure), possible issue if the source string is not using utf8 (is it possible in Drupal?) - use the
loadXML()
method instead of theloadHTML()
one
how:$dom = new DOMDocument; $dom->loadXML($text);
pros: one line fix
cons: could break if the given HTML is not perfect (ie: unclosed tag), could be mitigated by running this filter after thefilter_htmlcorrector
filter from core but that would be in the site builder hands - encapsulate the string into a minimal HTML structure before passing it to the
loadHTML()
method
how:$dom = new DOMDocument; $charset = mb_detect_encoding($text); $html = "<!DOCTYPE html><html><head><meta charset='$charset'></head><body>$text</body></html>"; $dom->loadHTML($html);
pros: workaround the cons of other options
cons: looks a bit hackish
- use
- Merge request !1Issue #3410145: wrap the string in a minimal HTML structure to prevent encoding issues. β (Merged) created by duaelfr
- Status changed to Needs review
9 months ago 12:03pm 14 February 2024 - π«π·France duaelfr Montpellier, France
I just opened the !1 MR with option 3 from my previous comment.
- π«π·France duaelfr Montpellier, France
Fixed stupid mistake in the MR (I wasn't using the forged html string in the loadHTML method...)
-
mlncn β
committed 221c2b32 on 1.x authored by
DuaelFr β
Issue #3410145 by mlncn: Paragraph tags are not stripped if the p tag...
-
mlncn β
committed 221c2b32 on 1.x authored by
DuaelFr β
- Status changed to RTBC
9 months ago 12:41am 29 February 2024 - πΊπΈUnited States mlncn Minneapolis, MN, USA
Should we document that if there is a mix of text in paragraph tags and not in paragraphs tags, the content that is not in paragraph tags will be lost? (That's how it's going to work here, correct?)
Also DuaelFr, would you be willing to be a co-maintainer for this module?
- πΊπΈUnited States ksenzee Washington state
I'd say it needs documenting, because it's a change that makes this module impossible for me to use. I was hoping to use it as a workaround for a TMGMT issue where my text sometimes is saved with
wraparounds and sometimes not, and this means it won't work in that situation. Obviously that's not the fault of this module, and the right fix in my situation is to get TMGMT to quit taking formatted fields with textfield widgets and presenting them as textareas with CKEditor, but it would have saved me some time to know that a mix of
and no
is unsupported.
- Status changed to Fixed
8 months ago 1:19pm 14 March 2024 - π«π·France duaelfr Montpellier, France
This has been committed so the issue should be marked as "Fixed".
I faced a new issue so I opened a follow-up π Warnings when using this on HTML5 markup Active to continue improving the module. We might want to write tests at some point.@mlncn I don't have much time to spend on maintainership but I can jump in if you need.
Documentation improvements might be discussed in another issue: π Improve documentation Active
Automatically closed - issue fixed for 2 weeks with no activity.