Paragraph tags are not stripped if the p tag has attributes like dir=ltr or title

Created on 21 December 2023, 11 months ago
Updated 28 March 2024, 8 months ago

Problem/Motivation

For example:

<p dir="ltr" data-popupalt-original-title="null" title="null">Urban <abbr title="dumbness">Design</abbr> &amp; Planning</p>

Still prints the paragraph out.

We have made certain the strippfilter is running last in the text format.

The bug manifested as "Image caption titles don't show up at consistent size" in Windows 11 Home on Chrome.

Filling in caption information in 2-image paragraph

Result: Caption title shows at a larger size

Expectation: Caption title shows up consistently

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States mlncn Minneapolis, MN, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @mlncn
    • mlncn β†’ committed 243e2f32 on 1.x
      Issue #3410145 by mlncn: Paragraph tags are not stripped if the p tag...
  • Status changed to Needs review 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mlncn Minneapolis, MN, USA
  • Status changed to Needs work 9 months ago
  • πŸ‡«πŸ‡·France duaelfr Montpellier, France

    Hi! Thank you for this module!

    Sadly, this change breaks utf8 support.
    For example: <p>lΓ©gende</p> is converted to l&Atilde;&copy;gende which is then displayed as légende.

    Looking for a fix, I've come to 3 options:

    1. use utf8_decode() on the string passed to the loadHTML() method
      how:
        $dom = new DOMDocument;
        $dom->loadHTML(utf8_decode($text));
      

      pros: one line fix
      cons: might mess with some specific characters (not sure), possible issue if the source string is not using utf8 (is it possible in Drupal?)

    2. use the loadXML() method instead of the loadHTML() one
      how:
        $dom = new DOMDocument;
        $dom->loadXML($text);
      

      pros: one line fix
      cons: could break if the given HTML is not perfect (ie: unclosed tag), could be mitigated by running this filter after the filter_htmlcorrector filter from core but that would be in the site builder hands

    3. encapsulate the string into a minimal HTML structure before passing it to the loadHTML() method
      how:
        $dom = new DOMDocument;
        $charset = mb_detect_encoding($text);
        $html = "<!DOCTYPE html><html><head><meta charset='$charset'></head><body>$text</body></html>";
        $dom->loadHTML($html);
      

      pros: workaround the cons of other options
      cons: looks a bit hackish

  • Status changed to Needs review 9 months ago
  • πŸ‡«πŸ‡·France duaelfr Montpellier, France

    I just opened the !1 MR with option 3 from my previous comment.

  • πŸ‡«πŸ‡·France duaelfr Montpellier, France

    Fixed stupid mistake in the MR (I wasn't using the forged html string in the loadHTML method...)

  • Pipeline finished with Skipped
    9 months ago
    #106454
  • Status changed to RTBC 9 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mlncn Minneapolis, MN, USA

    Should we document that if there is a mix of text in paragraph tags and not in paragraphs tags, the content that is not in paragraph tags will be lost? (That's how it's going to work here, correct?)

    Also DuaelFr, would you be willing to be a co-maintainer for this module?

  • πŸ‡ΊπŸ‡ΈUnited States ksenzee Washington state

    I'd say it needs documenting, because it's a change that makes this module impossible for me to use. I was hoping to use it as a workaround for a TMGMT issue where my text sometimes is saved with

    wraparounds and sometimes not, and this means it won't work in that situation. Obviously that's not the fault of this module, and the right fix in my situation is to get TMGMT to quit taking formatted fields with textfield widgets and presenting them as textareas with CKEditor, but it would have saved me some time to know that a mix of

    and no

    is unsupported.

  • Status changed to Fixed 8 months ago
  • πŸ‡«πŸ‡·France duaelfr Montpellier, France

    This has been committed so the issue should be marked as "Fixed".
    I faced a new issue so I opened a follow-up πŸ› Warnings when using this on HTML5 markup Active to continue improving the module. We might want to write tests at some point.

    @mlncn I don't have much time to spend on maintainership but I can jump in if you need.

    Documentation improvements might be discussed in another issue: πŸ“Œ Improve documentation Active

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024