</code><h3 id="summary-problem-motivation">Problem/Motivation</h3>
The <code>\Drupal\Component\Utility\Html::serialize
method uses \DOMDocument::saveXML
instead of \DOMDocument::saveHTML
to turn a \DOMDocument
object back into an HTML string. I know HTML is a form of XML, but in some cases this can cause issues.
For example: when the following piece of HTML is being passed to \Drupal\Component\Utility\Html::normalize
(which calls the serialize method):
<p>The Dutch word for example is <span lang="nl">voorbeeld</span></p>
The output is:
<p>The Dutch word for example is <span lang="nl" xml:lang="nl">voorbeeld</span></p>
For some reason, a new xml:lang
attribute was added.
Now that alone is not a real big problem. However, if at some point this output is being passed to Html::normalize
again for a second time (for example two text filters that uses the Html::normalize
), we get the following output:
<p>The Dutch word for example is <span lang="nl" xml:lang="nl" xml:lang="nl">voorbeeld</span></p>
You see that we now have the xml:lang
twice which is faulty HTML. This looks like a bug in PHP or in libxml, but if we use saveHTML
instead of saveXML
, the problem is fixed (no xml:lang
attributes are added.
The big question is: Why use the saveXML
if there is a special saveHTML
function available?
Steps to reproduce
The issue can easily be reproduced in the default Umami example profile:
- Install Drupal with the Umami profile
- Create a new basic page
- Fill the Body field with the following HTML:
<p>The Dutch word for example is <span lang="nl">voorbeeld</span>.</p>
- Make sure the Basic HTML format is selected
- Save the page
Now, if you look in the source code of the page, you see the output is:
<p>The Dutch word for example is <span lang="nl" xml:lang="nl" xml:lang="nl">voorbeeld</span>.</p>
This is because in the Basic HTML format, multiple filters are enabled that use the Html::serialize
method:
- Align images
- Caption images
- Restrict images to this site
- Track images uploaded via a Text Editor
- Embed media
Proposed resolution
I think it is a better option to use \DOMDocument::saveHTML
instead of \DOMDocument::saveXML
in \Drupal\Component\Utility\Html::serialize
.
I am not sure if the impact of this is a big problem.
Remaining tasks
Release notes snippet
Edit Use \DOMDocument::saveHTML instead of \DOMDocument::saveXML in Html::serialize