- Issue created by @joelpittet
- Status changed to Needs review
12 months ago 9:08pm 5 January 2024 - π¨π¦Canada joelpittet Vancouver
Further investigation is that it seems to be related to
Html::normalize()
https://www.drupal.org/node/2441811 β - π¨π¦Canada joelpittet Vancouver
https://stackoverflow.com/a/9760247/80281
Use DOMDocument::loadHTMLFile() instead of load(). That's what it has been made for. HTML is not XML.
XML does not know the named entity . However if you use loadHTML, the XML parser will get the HTML named entities loaded so the error goes away.
See as well: XML parser error: entity not defined.
I think this makes sense, that
masterminds/html5
has been in there I believe since Drupal 8.0 release because I remember solving problems with it in the tests prior to twig being included in core. - Merge request !11Get and HTML DOMElement of the fragment and append it β (Open) created by joelpittet
- π¨π¦Canada joelpittet Vancouver
I'm not a fan of the DOMDocument API, so my apologies for being verbose. If only there was an
appendHTML()
this would have been a 1 liner. - π¨π¦Canada joelpittet Vancouver
I do believe it's ready to review though, this seems to work and will work in both D10.2 and earlier
- π²π½Mexico Alan Delval
Tested but didn't work. It fails silently, headers on nodes text are missing.
My workaround to avoid filling logs, is decoding to " ".
- π΅π±Poland szy
The same as #9 - my headers (h2 tags) in text are lost.
Szy.
- π³πΏNew Zealand ericgsmith
I am curious about the reported failures - I tested and have not be able to reproduce the headings getting lost. Is anybody able to share sample HTML they used when the headers go lost & config?
It looks good to me, resolves the errors reported and I can still see the headings in the text.
- π΅π±Poland szy
@ericgsmith, are you leaving at least one nbsp at the end of any h2? :)
This is a condition, iirc.
Szy.
- π¬π§United Kingdom Hephaestus
Can confirm that we encounterd this issue (in our case, an
at the start of a header) and the commit in the MR (1c95fbbb) resolved the issue.We didn't experience the problem Matthew reported in the MR comment.
- Status changed to Needs work
11 months ago 7:49pm 9 February 2024 - π¨π¦Canada kiwad
I had same result as Szy.
After applying patch, h2 were vanishing from text
- π¨π¦Canada Liam Morland Ontario, CA π¨π¦
Liam Morland β made their first commit to this issueβs fork.
- Status changed to Needs review
11 months ago 8:22pm 9 February 2024 - π¨π¦Canada Liam Morland Ontario, CA π¨π¦
I have put on the merge request a fix that uses
html_entity_decode()
. This is fixing the error messages and not removing theh2
elements. - π¨π¦Canada Liam Morland Ontario, CA π¨π¦
This is the changed mentioned in #16 as a patch.
- Status changed to Needs work
11 months ago 8:55pm 9 February 2024 - π¨π¦Canada Liam Morland Ontario, CA π¨π¦
In further testing, I found this approach does not work because it outputs characters instead of entities for something that need to be entities in XML.
- π³πΏNew Zealand ericgsmith
@ericgsmith, are you leaving at least one nbsp at the end of any h2? :)
Yes - hence saying that the patch resolved this issue.
I am curious - people reporting the headers are missing - do you have a custom toc-header.html.twig file that adds additional elements to the header? There is a flaw in the original patch in that it expects only 1 root level element in the header, I have been able to reproduce HTML being removed with a custom template that was adding additional elements in the header element.
- Merge request !12#3412644 Whitespace HTML entities break DOM parsing in Drupal 10.2 β (Merged) created by ericgsmith
- Status changed to Needs review
11 months ago 12:49am 12 February 2024 - π³πΏNew Zealand ericgsmith
Ok, I have opened MR12 for reasons described above (a change in approach IMO should be done in a new MR so I didn't want to revert the approach introduced MR11)
This goes back to the original approach from @joelpittet and includes a minor change to account for instances where the toc header may return more than 1 child element.
I believe the issue some described with missing text was caused by either theme debug being enabled, or by having a template what introduced additional elements prior to the heading.
for us the patch#21 seems to solve the problem too.
would be nice, to have an update of the module.- πΊπΈUnited States aaronpinero
I can also confirm that patch#21 seems to solve the problem in Drupal 10.2.5
- Status changed to RTBC
8 months ago 11:01am 15 April 2024 - π©πͺGermany jurgenhaas Gottmadingen
Patch from #21 fixes the issue for us with Drupal 10.2.5 as well.
- π©πͺGermany marcoka
I can confirm #21 works here too with Drupal 10.2.4
- First commit to issue fork.
-
jrockowitz β
committed 2bf377e9 on 8.x-1.x authored by
ericgsmith β
Issue #3412644: Whitespace HTML entities break DOM parsing in Drupal 10....
-
jrockowitz β
committed 2bf377e9 on 8.x-1.x authored by
ericgsmith β
- Status changed to Fixed
7 months ago 3:59pm 23 May 2024 Automatically closed - issue fixed for 2 weeks with no activity.