- Issue created by @jurgenhaas
- last update
almost 2 years ago 15 pass - @jurgenhaas opened merge request.
- Status changed to Needs review
almost 2 years ago 9:47am 16 May 2023 - last update
almost 2 years ago 15 pass - π©πͺGermany jurgenhaas Gottmadingen
Extended the scope a bit, as the
<br />
element is also lost and to preserve that, a two-step search-and-replace is necessary, unfortunately. - Status changed to Needs work
2 months ago 8:42am 3 February 2025 - π©πͺGermany Anybody Porta Westfalica
This should please be re-tried with 3.x and a test should be added to ensure it works as expected. Thanks :)
- π΅πΉPortugal joao.ramos.costa
joao.ramos.costa β made their first commit to this issueβs fork.
- Merge request !333360633: Preserve html entities and br tag during tooltip rendering. β (Open) created by Unnamed author
- π΅πΉPortugal joao.ramos.costa
Hi @anybody,
I've ported MR13 to 3.x and included tests.
- π©πͺGermany Anybody Porta Westfalica
@joao.ramos.costa thanks a lot! I left a comment. @jurgenhaas what do you think, especially regarding the security implications?
I don't have a good idea how to solve this cleanly yet. Are we sure we can't fix the root cause instead?
Thanks for the tests especially, great! :)
- π©πͺGermany jurgenhaas Gottmadingen
@jurgenhaas what do you think, especially regarding the security implications?
I don't have a good idea how to solve this cleanly yet. Are we sure we can't fix the root cause instead?
I wish I were able to resolve this otherwise. This is a 2-year-old issue, and I remember that we spent days and weeks for debugging this. It's been a real nightmare and very strange things were happening. I don't think we will change strategy at our end, as our client is just happy that it's working on their millions of nodes that they have. So, even if this MR eventually takes a different approach, we will probably stick with the proven patch.
Note: our "nightmare wasn't just this issue, there have been plenty of issues in glossify back then, where a lot of them occurred in nodes with outrageous long body fields and often copy and pasted content from sources like word.
- π΅πΉPortugal joao.ramos.costa
Yes, not easy indeed. I only think of HTMLPurifier as a safe alternative with performance cost, ofc.
- π΅πΉPortugal joao.ramos.costa
Hi @anybody and @jurgenhaas,
just commited a suggestion including HTMLPurifier (https://github.com/ezyang/htmlpurifier) which is a decent library to provide proper sanitization, included also a composer.json to require it. The tests must be updated. Not sure how.
- π©πͺGermany Anybody Porta Westfalica
Thanks @joao.ramos.costa that looks nice!! I'm unsure if we should really add the HTML Purifier dependency required here.
Maybe we should better make it opt-in via a setting or even make the integration a submodule to allow people opting-in?What's do the others here think?
- π΅πΉPortugal joao.ramos.costa
I tend to agree with you,
in fact one maybe don't want to use tooltips and don't need HtmlPurifier at all, but on the other hand currently including a
triggers a warning. We can even simply add a hook there and anyone may sanitize the text with any other method. I'm not sure re other module, but the setting seems a decent approach. Only would be available if someone have this dependency enabled.But let's see others opinion, then.
tkx
- First commit to issue fork.
- Merge request !41Avoid using appendXML when reinserting matches or rendered tips, since we may... β (Open) created by bkosborne
- πΊπΈUnited States bkosborne New Jersey, USA
All of these problems are related to incorrect usage of appendXML. We cannot use this for reinserting matches or rendered tips, because we're dealing with HTML and not XML. There's lots of chars that might be invalid XML like which will cause parser warnings.
Since PHP doesn't have
appendHTML
, we instead can create a temporary HTML DOM document based on the rendered tip, then import the nodes it creates into our original DOM document. Now the rendered tip is treated as HTML correctly and special characters like are carried forward (Note that they are double encoded because of how this module sets up the rendering, but IMO this is a separate issue to deal with.)I created a new MR !41 with this approach. I also carried over two tests that were written from the other MR, but I modified their expected output to match reality of how this works. The original MR is doing things like replacing
with line breaks, but that doesn't seem to belong here. If there's a desire to convert
to line breaks, we should create a separate issue for that. - π©πͺGermany Anybody Porta Westfalica
Thanks @bkosborne for your valuable input! That looks great and tests succeed without the need for additional dependencies.
I added some comments, partially a bit unrelated but useful in general I hope.
If the others here can confirm MR!41 works for them and solves the issue, we can merge this soon. - π΅πΉPortugal joao.ramos.costa
joao.ramos.costa β changed the visibility of the branch 3360633-html-entities-lost-3.x to hidden.
- π΅πΉPortugal joao.ramos.costa
Hi @bkosborne, I've tried your MR, worked on my side, thanks a lot for your work.
Yes, I agree that convert the
<br>
may be done in other issue,
but tried to keep as much as previous patch was doing, since it was added from #4 π HTML entities lost during tooltip rendering Active and I wanted to keep a commitment. But yes, you are right. Let's do it in another issue, if necessary. - π¨πSwitzerland boromino
Now the rendered tip is treated as HTML correctly and special characters like
are carried forward (Note that they are double encoded because of how this module sets up the rendering, but IMO this is a separate issue to deal with.)The original MR is doing things like replacing
<br>
with line breaks, but that doesn't seem to belong here for this issue. If there's a desire to convert to line breaks, we should create a separate issue for that.As far as I understand https://www.drupal.org/project/glossify/issues/3360633#comment-15983410 π HTML entities lost during tooltip rendering Active , the 2 initial issues are not addressed at all in MR !41. On my test with a term description stored as
Lorem Ipsum has<br /> been the industry's standard & poor dummy text ever since
in the database,Lorem Ipsum has been the industry's standard & poor's dummy text ever since
(without line break) is displayed in the tooltip. - π©πͺGermany Anybody Porta Westfalica
Thanks @boromino, could you maybe add a test like that to show it fails?
That would be super helpful to get to the right point! - πΊπΈUnited States bkosborne New Jersey, USA
Hmm, yes I misinterpreted the original issue summary. My MR probably belongs in a separate issue. I'll work on that.
- πΊπΈUnited States bkosborne New Jersey, USA
bkosborne β changed the visibility of the branch 3360633-avoid-appendXML to hidden.
- πΊπΈUnited States bkosborne New Jersey, USA
Ok I closed my MR from here and moved it to π Fix parser errors that can occur when rendered tooltip contains certain characters like Active .
So the focus of this issue should be to fix the encoding of HTML special characters to prevent
from appearing as&nbsp;<code> and the fact that tooltips are stripping HTML. Note the default theme implementation puts tooltips in the title <em>attribute</em> of an <code>abbr
element. We cannot put HTML in there, so it makes sense to strip HTML. But we shouldn't be encoding the attributes. The reason that's happening is because Twig is escaping the output. We may be able to avoid that...
- πΊπΈUnited States bkosborne New Jersey, USA
So we're currently running tip through
strip_tags
and returning that, which then gets inserted into the HTML attribute for the<abbr title="{{ tip }}">term</abbr>
. Twig is then doing auto-escaping and that's what causes the & to be converted to&
. You can be avoided by having tip be wrapped inMarkup::create($tip);
before passing to twig, which will prevent twig from auto-escaping it. This seems like it would be safe enough, since we're already running things thru strip_tags to remove HTML, but strip_tags is not sufficient when the value is inserted into an HTML attribute, because it doesn't strip or encode quotes which are used as HTML attribute delimiters. For that, we're back to needinghtmlspecialchars
which is what Twig is doing for us already!I'm not really sure the best path forward here.