HTML utility serialize remove doctype

Created on 31 March 2025, 2 days ago

Problem/Motivation

When using core/lib/Drupal/Component/Utility/Html::load then Html::serialize(), if the original string had "", it will be removed.

Steps to reproduce

$html_markup = '<!DOCTYPE html>
      <html lang="fr" dir="ltr">
        <head>
          <meta charset="utf-8" />
        </head>
        <body>
        </body>
      </html>';
    $dom = Html::load($html_markup);
    $result = Html::serialize($dom);

result:

     <html lang="fr" dir="ltr"><head>
          <meta charset="utf-8">
        </head>
        <body>
        </body>
      </html>

Proposed resolution

I have not found options or investigated how the https://github.com/Masterminds/html5-php works exactly to propose a fix.

Remaining tasks

Find bug root cause.

🐛 Bug report
Status

Active

Version

11.0 🔥

Component

base system

Created by

🇫🇷France Grimreaper France 🇫🇷

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Grimreaper
  • Is Html::load intended to work on header markup?

  • 🇦🇺Australia mstrelan

    I think this is by design:

       * This function loads the body part of a partial HTML document and returns a
       * full \DOMDocument object that represents this document.
    

    and

       * @param string $html
       *   The partial HTML snippet to load. Invalid markup will be corrected on
       *   import.
    

    Both suggest it's only intended to deal with partial snippets, not full documents.

  • 🇫🇷France Grimreaper France 🇫🇷

    Hi,

    Thanks both of you for your replies.

    Then in case I will adapt my code to only manipulate the body.

Production build 0.71.5 2024