HTML entities lost during tooltip rendering

Created on 16 May 2023, almost 2 years ago

Problem/Motivation

When the tooltip contains HTML entities like e.g. •, the render function turns that into •.

Proposed resolution

Use html_entity_decode in \Drupal\glossify\GlossifyBase::sanitizeTip to replace such entities with the UTF-8 character instead.

πŸ› Bug report
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @jurgenhaas
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update almost 2 years ago
    15 pass
  • @jurgenhaas opened merge request.
  • Status changed to Needs review almost 2 years ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update almost 2 years ago
    15 pass
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    Extended the scope a bit, as the <br /> element is also lost and to preserve that, a two-step search-and-replace is necessary, unfortunately.

  • Status changed to Needs work 2 months ago
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    This should please be re-tried with 3.x and a test should be added to ensure it works as expected. Thanks :)

  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    joao.ramos.costa β†’ made their first commit to this issue’s fork.

  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    Hi @anybody,

    I've ported MR13 to 3.x and included tests.

  • Pipeline finished with Canceled
    2 months ago
    Total: 242s
    #413657
  • Pipeline finished with Success
    2 months ago
    Total: 145s
    #413662
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    @joao.ramos.costa thanks a lot! I left a comment. @jurgenhaas what do you think, especially regarding the security implications?

    I don't have a good idea how to solve this cleanly yet. Are we sure we can't fix the root cause instead?

    Thanks for the tests especially, great! :)

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    @jurgenhaas what do you think, especially regarding the security implications?

    I don't have a good idea how to solve this cleanly yet. Are we sure we can't fix the root cause instead?

    I wish I were able to resolve this otherwise. This is a 2-year-old issue, and I remember that we spent days and weeks for debugging this. It's been a real nightmare and very strange things were happening. I don't think we will change strategy at our end, as our client is just happy that it's working on their millions of nodes that they have. So, even if this MR eventually takes a different approach, we will probably stick with the proven patch.

    Note: our "nightmare wasn't just this issue, there have been plenty of issues in glossify back then, where a lot of them occurred in nodes with outrageous long body fields and often copy and pasted content from sources like word.

  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    Yes, not easy indeed. I only think of HTMLPurifier as a safe alternative with performance cost, ofc.

  • Pipeline finished with Failed
    2 months ago
    Total: 234s
    #414035
  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    Hi @anybody and @jurgenhaas,

    just commited a suggestion including HTMLPurifier (https://github.com/ezyang/htmlpurifier) which is a decent library to provide proper sanitization, included also a composer.json to require it. The tests must be updated. Not sure how.

  • Pipeline finished with Success
    2 months ago
    Total: 142s
    #414126
  • Pipeline finished with Success
    2 months ago
    Total: 187s
    #414400
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Thanks @joao.ramos.costa that looks nice!! I'm unsure if we should really add the HTML Purifier dependency required here.
    Maybe we should better make it opt-in via a setting or even make the integration a submodule to allow people opting-in?

    What's do the others here think?

  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    I tend to agree with you,
    in fact one maybe don't want to use tooltips and don't need HtmlPurifier at all, but on the other hand currently including a &nbsp; triggers a warning. We can even simply add a hook there and anyone may sanitize the text with any other method. I'm not sure re other module, but the setting seems a decent approach. Only would be available if someone have this dependency enabled.

    But let's see others opinion, then.

    tkx

  • First commit to issue fork.
  • Pipeline finished with Success
    2 months ago
    Total: 144s
    #421436
  • πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA

    All of these problems are related to incorrect usage of appendXML. We cannot use this for reinserting matches or rendered tips, because we're dealing with HTML and not XML. There's lots of chars that might be invalid XML like   which will cause parser warnings.

    Since PHP doesn't have appendHTML, we instead can create a temporary HTML DOM document based on the rendered tip, then import the nodes it creates into our original DOM document. Now the rendered tip is treated as HTML correctly and special characters like   are carried forward (Note that they are double encoded because of how this module sets up the rendering, but IMO this is a separate issue to deal with.)

    I created a new MR !41 with this approach. I also carried over two tests that were written from the other MR, but I modified their expected output to match reality of how this works. The original MR is doing things like replacing
    with line breaks, but that doesn't seem to belong here. If there's a desire to convert
    to line breaks, we should create a separate issue for that.

  • πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Thanks @bkosborne for your valuable input! That looks great and tests succeed without the need for additional dependencies.

    I added some comments, partially a bit unrelated but useful in general I hope.
    If the others here can confirm MR!41 works for them and solves the issue, we can merge this soon.

  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    joao.ramos.costa β†’ changed the visibility of the branch 3360633-html-entities-lost-3.x to hidden.

  • πŸ‡΅πŸ‡ΉPortugal joao.ramos.costa

    Hi @bkosborne, I've tried your MR, worked on my side, thanks a lot for your work.

    Yes, I agree that convert the <br> may be done in other issue,
    but tried to keep as much as previous patch was doing, since it was added from #4 πŸ› HTML entities lost during tooltip rendering Active and I wanted to keep a commitment. But yes, you are right. Let's do it in another issue, if necessary.

  • πŸ‡¨πŸ‡­Switzerland boromino

    Now the rendered tip is treated as HTML correctly and special characters like &nbsp; are carried forward (Note that they are double encoded because of how this module sets up the rendering, but IMO this is a separate issue to deal with.)

    The original MR is doing things like replacing <br> with line breaks, but that doesn't seem to belong here for this issue. If there's a desire to convert to line breaks, we should create a separate issue for that.

    As far as I understand https://www.drupal.org/project/glossify/issues/3360633#comment-15983410 πŸ› HTML entities lost during tooltip rendering Active , the 2 initial issues are not addressed at all in MR !41. On my test with a term description stored as Lorem Ipsum has<br /> been the industry's standard &amp; poor dummy text ever since in the database, Lorem Ipsum has been the industry's standard &amp; poor's dummy text ever since (without line break) is displayed in the tooltip.

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Thanks @boromino, could you maybe add a test like that to show it fails?
    That would be super helpful to get to the right point!

  • πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA

    Hmm, yes I misinterpreted the original issue summary. My MR probably belongs in a separate issue. I'll work on that.

  • πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA

    bkosborne β†’ changed the visibility of the branch 3360633-avoid-appendXML to hidden.

  • πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA

    Ok I closed my MR from here and moved it to πŸ› Fix parser errors that can occur when rendered tooltip contains certain characters like   Active .

    So the focus of this issue should be to fix the encoding of HTML special characters to prevent &nbsp; from appearing as

    &amp;nbsp;<code> and the fact that tooltips are stripping HTML.
    
    Note the default theme implementation puts tooltips in the title <em>attribute</em> of an <code>abbr

    element. We cannot put HTML in there, so it makes sense to strip HTML. But we shouldn't be encoding the attributes. The reason that's happening is because Twig is escaping the output. We may be able to avoid that...

  • πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA

    So we're currently running tip through strip_tags and returning that, which then gets inserted into the HTML attribute for the <abbr title="{{ tip }}">term</abbr>. Twig is then doing auto-escaping and that's what causes the & to be converted to &amp;. You can be avoided by having tip be wrapped in Markup::create($tip); before passing to twig, which will prevent twig from auto-escaping it. This seems like it would be safe enough, since we're already running things thru strip_tags to remove HTML, but strip_tags is not sufficient when the value is inserted into an HTML attribute, because it doesn't strip or encode quotes which are used as HTML attribute delimiters. For that, we're back to needing htmlspecialchars which is what Twig is doing for us already!

    I'm not really sure the best path forward here.

Production build 0.71.5 2024