- Issue created by @prudloff
- Status changed to Needs review
about 2 months ago 1:39pm 29 April 2024 - π«π·France prudloff Lille
GitLab fails to create the issue fork for some reason so here is a patch.
- π«π·France prudloff Lille
Turns out
Html::load()
removes everything outside the body so this is not what we need here.
The root problem seems to be thatDOMDocument::loadHTML()
does not detect the encoding correctly. Forcing it like this works but does not feel very clean:$success = @$dom->loadHTML('<?xml encoding="utf-8"
' . $html);
?>Using the HTML5 library seems to work correctly (it is what
Html::load()
uses internally).