Upgrade tests to HTML5

Created on 26 February 2015, almost 10 years ago
Updated 5 November 2023, about 1 year ago

Problem/Motivation

This issue is part of 🌱 [Meta] PHP DOM (libxml2) misinterprets HTML5 Active .

The Drupal\system\Tests\Theme\FunctionsTest needs to be upgraded to HTML5, along with other parts of the code using the php DOM extension to do parsing of HTML.

Blocked on #2667340: Usage of field_prefix and container-inline creates invalid markup. which was fixed in #3072752: Invalid markup in ImageItem (causes test failures with symfony/dom-crawler:4.3+) .

Beta phase evaluation

<!--Uncomment the relevant rows for the issue. -->

Commit credits
Please give commit credits to all who have worked on the parent issue ( 🌱 [Meta] PHP DOM (libxml2) misinterprets HTML5 Active ).

Steps to reproduce

N.A., is about the use of Masterminds for HTML parsing on tests instead of php DOM extension parsing for HTML.

Proposed resolution

Use Masterminds HTML parsing.

Remaining tasks

Decide if: (a) apply the patch at #94 as it is, and open a follow-up for the help topics test fix; or (b) apply the patch from #100 with the work-around, and open a follow-up for fixing the one edge case in the help topics test that is failing; or (c) continue fixing that help topics test case here.

User interface changes

N.A.

API changes

N.A.

Data model changes

N.A.

Release notes snippet

N.A.

🐛 Bug report
Status

Fixed

Version

11.0 🔥

Component
PHPUnit 

Last updated about 19 hours ago

Created by

🇳🇱Netherlands daffie

Live updates comments and jobs are added and updated live.
  • html5

    Implements and supports the use of HTML5.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇮🇳India vsujeetkumar Delhi

    Re-roll patch created as per #86. rdf changes has been removed from the patch as it is not part of core now.

  • last update over 1 year ago
    30,385 pass, 1 fail
  • Status changed to Needs review over 1 year ago
  • last update over 1 year ago
    30,392 pass
  • 🇮🇳India vsujeetkumar Delhi

    Fixed the test case, Please have a look.

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States smustgrave

    Was previously tagged for issue summary update which believe still needs to happen.

    Also not 100% about the test fix in #88. Is that what HTML5 requires?

  • 🇮🇳India vsujeetkumar Delhi

    Also not 100% about the test fix in #88. Is that what HTML5 requires?

    @smustgrave, Will take the reference from "related & committed" issue in #86, Please have a look on file "core/modules/filter/tests/src/Kernel/FilterCaptionTwigDebugTest.php" in the related issue and advise.

  • 🇬🇧United Kingdom longwave UK
    +++ b/core/modules/system/tests/src/Kernel/Theme/FunctionsTest.php
    @@ -480,8 +481,7 @@ public function testDrupalPreRenderLinks() {
    -    $dom = new \DOMDocument();
    -    $dom->loadHTML($html);
    +    $dom = (new HTML5())->loadHTML($html);
    

    Can we use Html::load() here instead of using the HTML5 parser directly? Same for all the other places that we call new HTML5().

  • 🇮🇳India vsujeetkumar Delhi

    Addressed #92, Used Html::load() wherever we use HTML5(). Patch created, Please have a look.

  • 29:28
    27:18
    Running
  • 🇮🇳India vsujeetkumar Delhi

    Fixed test cases, Please have a look.

  • last update over 1 year ago
    30,415 pass
  • Status changed to Needs review over 1 year ago
  • 🇬🇧United Kingdom longwave UK

    This looks good to me so far. The remaining uses of new DOMDocument() in tests are:

    core/modules/help/tests/src/Functional/HelpTopicsSyntaxTest.php:    $doc = new \DOMDocument();
    core/tests/Drupal/KernelTests/AssertContentTrait.php:        $dom = new \DOMDocument();
    core/tests/Drupal/Tests/Component/Utility/HtmlTest.php:    $document = new \DOMDocument();
    

    The latter two are explicit cases where we still need DOMDocument - loading XML instead of HTML and testing a specific serialization bug.

    Not sure what to do with HelpTopicsSyntaxTest; it uses libxml to ensure the HTML is valid and unsure how we can port this to the HTML5 library.

  • last update over 1 year ago
    Custom Commands Failed
  • 🇵🇪Peru marvil07

    Not sure what to do with HelpTopicsSyntaxTest; it uses libxml to ensure the HTML is valid and unsure how we can port this to the HTML5 library.

    @longwave, there may be a port for HelpTopicsSyntaxTest .
    Here a version of the validation there using Masterminds\HTML5 functionality.
    Let us see what CI thinks.

  • last update over 1 year ago
    30,416 pass, 1 fail
  • 🇮🇳India vsujeetkumar Delhi

    Fixed the CCF issues.

  • last update over 1 year ago
    Patch Failed to Apply
  • last update over 1 year ago
    Patch Failed to Apply
  • 🇵🇪Peru marvil07

    @vsujeetkumar, thanks for the variable typo fix, I was hoping to not need to run it locally :'-)

    The test fail in #97 was related to the approach change.
    Basically the error strings are different on Masterminds\HTML5 parser, than in php DOM extension libxml2.
    Gladly there seems to be only one actual relevant error tested from the library output, the rest are errors created on the test itself related to structure.

    After changing a bit how this works, I arrived to a point were it is mainly OK, but there is a problem in one case, on bad_html3 help topic error template.
    Likely the parsing is different and produces a bit different error for that case, or there is an actual error to fix.

    I am attaching a couple of patches, one expected to fail without the workaround, and other skipping the check on that specific case.
    At this point we may either use the patch at #94 and open a follow-up specifically for help topics, that my use the code on the patches in this comment; or continue here.

  • 45:57
    45:24
    Running
  • 45:57
    45:23
    Running
  • 🇮🇳India vsujeetkumar Delhi

    #99 patch fails to apply, created updated path without fails. Also on below comment I am also thinking the same for now we can use #94 and do followups on help topics, Here we need expert advise.

    At this point we may either use the patch at #94 and open a follow-up specifically for help topics, that my use the code on the patches in this comment; or continue here.

  • 🇵🇪Peru marvil07

    @vsujeetkumar: thanks for re-exporting the patch as p1!
    It seems I exported p0 instead :'-)
    I see the test results reflect the hypothesis \o/

  • Status changed to Needs work about 1 year ago
  • 🇺🇸United States smustgrave

    Was previously tagged for issue summary update which still believe is needed.

    Since this is fixing tests I'm wondering if it should just continue here. Is there a large number of help_topic tests that need to be fixed?

  • Status changed to Needs review about 1 year ago
  • 🇵🇪Peru marvil07

    @smustgrave,

    Was previously tagged for issue summary update which still believe is needed.

    I have updated the description.

    Since this is fixing tests I'm wondering if it should just continue here. Is there a large number of help_topic tests that need to be fixed?

    Maybe :-)
    Per my comment above at #99 🐛 Upgrade tests to HTML5 Needs review , I would suggest to apply the change on #94, which cover most of the cases, and open at least one follow-up for the help_topic test.

    To answer the question, it is only one edge case in one of the help topics tests failing, but since that test is walking over all help topics data in core, it may be worth a follow-up.

    Also, I noticed I did not even mentioned the other two cases @longwave mentioned on #95, but I arrived to the same conclusion.

    The latter two are explicit cases where we still need DOMDocument - loading XML instead of HTML and testing a specific serialization bug.

    On core/tests/Drupal/KernelTests/AssertContentTrait.php case, that new DOMDocument object is only used for parsing xml content, and around it, it already uses the Html::load() helper that will use HTML5 version from Masterminds.
    Here an extract:

    // This is an XML document.
    if (stripos(ltrim($content), '<?xml') === 0) {
      $dom = new \DOMDocument();
      $dom->loadXML($content);
    }
    else { 
      $dom = Html::load($content);
    }
    

    On core/tests/Drupal/Tests/Component/Utility/HtmlTest.php case, it also seems OK.
    The use there is just a new empty object passed on the next line to Html::serialize(), of which the signature of the method requires the DOMDocument class.
    In other words, it is not really used for parsing, so it is OK to leave it as it.

  • Status changed to RTBC about 1 year ago
  • 🇺🇸United States smustgrave

    @marvil07 thanks for the explanation. Opened up a follow up for help_topics.

  • last update about 1 year ago
    30,480 pass, 1 fail
  • last update about 1 year ago
    CI error
  • Status changed to Needs work about 1 year ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    There's a broken test. And...

    +++ b/core/tests/Drupal/KernelTests/AssertContentTrait.php
    @@ -122,14 +122,21 @@ protected function setDrupalSettings($settings) {
    -      // DOM can load HTML soup. But, HTML soup can throw warnings, suppress
    -      // them.
    -      $html_dom = new \DOMDocument();
    -      @$html_dom->loadHTML('<?xml encoding="UTF-8">' . $this->getRawContent(), LIBXML_NOBLANKS);
    -      if ($html_dom) {
    +      $content = $this->getRawContent();
    +
    +      // This is an XML document.
    +      if (stripos(ltrim($content), '<?xml') === 0) {
    +        $dom = new \DOMDocument();
    +        $dom->loadXML($content);
    +      }
    +      else {
    +        $dom = Html::load($content);
    +      }
    

    Is this checking for <?xml<code> actually necessary? Before we were adding <code><?xml encoding="UTF-8"> to the content so how would have the content that starts with

    <?xml<code> actaully work?
    
    I've tried to simulate what would have occurred before if you passed XML to parse():
    <code>
    $text = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
    $text = '<?xml encoding="UTF-8">' . $text;
    $html_dom = new \DOMDocument();
    $html_dom->loadHTML('<?xml encoding="UTF-8">' . $text, LIBXML_NOBLANKS);
    $elements = simplexml_import_dom($html_dom);
    

    Which results in the following SimpleXml element

    > $elements = simplexml_import_dom($html_dom);
    = SimpleXMLElement {#11412
        +"body": SimpleXMLElement {#11529
          +"note": SimpleXMLElement {#11525
            +"to": "Tove",
            +"from": "Jani",
            +"heading": "Reminder",
          },
        },
      }
    

    Which is nonsense compared to the input. We've added a body - because we're treating it as HTML and removed the spurious body from the HTML.

    I think parse has always been expecting HTML and nothing else.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I guess #94 is supposed to be the patch to commit. Can that be re-uploaded so it is the last patch on the issue otherwise it is very very confusing. The comment about AssertContentTrait::parse() from #106 still stand. I couldn't see any supporting evidence on the issue that we need to support XML in that method and if we do then we need to have some test coverage of it in AssertContentTraitTest.

  • Merge request !5226#2441373 Upgrade tests to HTML5 → (Open) created by marvil07
  • Status changed to Needs review about 1 year ago
  • 🇵🇪Peru marvil07

    I think parse has always been expecting HTML and nothing else.

    @alexpott, that may be the case, I guess the changes were being cautious, trying to not change the logic more than needed.

    I see that the actual parse() method was added at 0858e9350b8d99a8476b78a845e1531fb124acef in 2018, so it has been there for a while.
    That method is part of AssertContentTrait, which is mainly used in KernelTestBase, an naturally on DrupalKernel by extension.

    I do not see direct uses of parse().
    The following is an attempt to search for that, which only leads a false positive, i.e. not using $this.

    $ grep -F -- '->parse(' $(git grep \( -e KernelTestBase --or -e DrupalKernel \) --and -e extends --name-only)
    core/tests/Drupal/KernelTests/Config/DefaultConfigTest.php:    $info = \Drupal::service('info_parser')->parse($file_name);
    

    Even if no direct uses, parse() is used indirectly through the highly used xpath() method on the same trait.
    Said that, I could not find anything, at least with the following search, of use with non-HTML.

    $ grep -F -- 'this->xpath(' $(git grep \( -e KernelTestBase --or -e DrupalKernel \) --and -e extends --name-only)
    core/modules/comment/tests/src/Kernel/Views/CommentUserNameTest.php:    $comment_author = $this->xpath('//div[contains(@class, :class)]/span[normalize-space(text())=""]', [
    core/modules/field/tests/src/Kernel/FieldDisplayTest.php:    $elements = $this->xpath($css_selector_converter->toXPath('.visually-hidden'));
    core/modules/history/tests/src/Kernel/Views/HistoryTimestampTest.php:    $result = $this->xpath('//span[@class=:class]', [':class' => 'marker']);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//a[@href=:path]/img[@src=:url and @width=:width and @height=:height]', [':path' => base_path() . $path, ':url' => $url, ':width' => $image->getWidth(), ':height' => $image->getHeight()]);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//a[@href=:path]/img[@src=:url and @width=:width and @height=:height and @alt=""]', [':path' => base_path() . $path, ':url' => $url, ':width' => $image->getWidth(), ':height' => $image->getHeight()]);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//a[@href=:fragment]/img[@src=:url and @width=:width and @height=:height and @alt=""]', [
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//img[@src=:url and @alt=""]', [':url' => $url]);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//img[@src=:url]', [':url' => $url]);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//img[contains(@class, class) and contains(@alt, :alt)]', [":class" => "image-with-regular-alt", ":alt" => "Regular alt"]);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//img[contains(@class, class) and contains(@alt, :alt)]', [":class" => "image-with-attribute-alt", ":alt" => "Attribute alt"]);
    core/modules/image/tests/src/Kernel/ImageThemeFunctionTest.php:    $elements = $this->xpath('//img[contains(@class, class) and contains(@alt, :alt)]', [":class" => "image-with-attribute-alt", ":alt" => "Attribute alt"]);
    core/modules/system/tests/src/Kernel/Form/ElementsFieldsetTest.php:    $elements = $this->xpath('//fieldset[@id="' . $field_id . '" and @aria-describedby="' . $description_id . '"]//div[@id="edit-meta-default"]/following-sibling::div[@id="' . $description_id . '"]');
    core/modules/system/tests/src/Kernel/Form/ElementsFieldsetTest.php:    $elements = $this->xpath('//fieldset[@id="' . $field_id . '" and @aria-describedby="' . $description_id . '"]//div[@id="edit-meta-before"]/preceding-sibling::div[@id="' . $description_id . '"]');
    core/modules/system/tests/src/Kernel/Form/ElementsFieldsetTest.php:    $elements = $this->xpath('//fieldset[@id="' . $field_id . '" and @aria-describedby="' . $description_id . '"]//div[@id="edit-meta-after"]/following-sibling::div[@id="' . $description_id . '"]');
    core/modules/system/tests/src/Kernel/Form/ElementsFieldsetTest.php:    $elements = $this->xpath('//fieldset[@id="' . $field_id . '" and @aria-describedby="' . $description_id . '"]//div[@id="edit-meta-invisible"]/following-sibling::div[contains(@class, "visually-hidden")]');
    core/modules/system/tests/src/Kernel/Form/FormElementLabelTest.php:    $elements = $this->xpath($css_selector_converter->toXPath('.kitten'));
    core/modules/system/tests/src/Kernel/Form/FormElementLabelTest.php:    $elements = $this->xpath($css_selector_converter->toXPath('label.meow'));
    core/modules/system/tests/src/Kernel/Form/FormElementMaxlengthTest.php:    $elements = $this->xpath($css_selector_converter->toXPath('input[name=title][maxlength=255]'));
    core/modules/system/tests/src/Kernel/Form/FormElementMaxlengthTest.php:    $elements = $this->xpath($css_selector_converter->toXPath('textarea[name=description][maxlength=255]'));
    core/modules/views/tests/src/Kernel/Handler/AreaEntityTest.php:    $result = $this->xpath($header_xpath);
    core/modules/views/tests/src/Kernel/Handler/AreaEntityTest.php:    $result = $this->xpath($footer_xpath);
    core/modules/views/tests/src/Kernel/Handler/AreaEntityTest.php:    $result = $this->xpath($header_xpath);
    core/modules/views/tests/src/Kernel/Handler/AreaEntityTest.php:    $result = $this->xpath($footer_xpath);
    core/modules/views/tests/src/Kernel/Handler/AreaEntityTest.php:    $result = $this->xpath('//div[@class = "' . $view_class . '"]/header[1]');
    core/modules/views/tests/src/Kernel/Handler/AreaEntityTest.php:    $result = $this->xpath('//div[@class = "' . $view_class . '"]/footer[1]');
    core/modules/views/tests/src/Kernel/Handler/FieldFieldTest.php:    $field_values = $this->xpath('//div[contains(@class, "views-field-id")]/span[contains(@class, :class)]/div', [':class' => 'field-content']);
    core/modules/views/tests/src/Kernel/Plugin/DisplayPageTest.php:    $result = $this->xpath('//div[@class=:class]/a', [':class' => 'more-link']);
    core/modules/views/tests/src/Kernel/Plugin/DisplayPageTest.php:    $result = $this->xpath('//div[@class=:class]/a', [':class' => 'more-link']);
    core/modules/views/tests/src/Kernel/Plugin/DisplayPageTest.php:    $result = $this->xpath('//div[@class=:class]/a', [':class' => 'more-link']);
    core/modules/views/tests/src/Kernel/Plugin/DisplayPageTest.php:    $result = $this->xpath('//div[@class=:class]/a', [':class' => 'more-link']);
    core/modules/views/tests/src/Kernel/Plugin/DisplayPageTest.php:        $this->assertCount(5, $this->xpath("{$xpath}[not(text()) and not(node())]"), "Empty rows in theme '$theme', type '$type'.");
    core/modules/views/tests/src/Kernel/Plugin/ExposedFormRenderTest.php:    $result = $this->xpath('//form//div[contains(@id, "edit-type--2--description") and normalize-space(text())="Exposed description"]');
    core/modules/views/tests/src/Kernel/Plugin/StyleGridResponsiveTest.php:    $result = $this->xpath('//div[contains(@class, "views-view-responsive-grid") and contains(@class, :alignment)]', [':alignment' => 'views-view-responsive-grid--' . $expected['alignment']]);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridResponsiveTest.php:    $result = $this->xpath('//div[contains(@class, "views-view-responsive-grid") and contains(@style, :columns)]', [':columns' => '--views-responsive-grid--column-count:' . $expected['columns']]);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridResponsiveTest.php:    $result = $this->xpath('//div[contains(@class, "views-view-responsive-grid") and contains(@style, :min-width)]', [':min-width' => '--views-responsive-grid--cell-min-width:' . $expected['cell_min_width'] . 'px']);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridResponsiveTest.php:    $result = $this->xpath('//div[contains(@class, "views-view-responsive-grid") and contains(@style, :gutter)]', [':gutter' => '--views-responsive-grid--layout-gap:' . $expected['grid_gutter'] . 'px']);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridResponsiveTest.php:    $result = $this->xpath('//div[contains(@class, "views-view-responsive-grid")]/div[contains(@class, "views-view-responsive-grid__item")]/div[contains(@class, "views-view-responsive-grid__item-inner")]');
    core/modules/views/tests/src/Kernel/Plugin/StyleGridTest.php:      $result = $this->xpath('//div[contains(@class, "views-view-grid") and contains(@class, :alignment) and contains(@class, :columns)]', [':alignment' => $alignment, ':columns' => 'cols-' . $columns]);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridTest.php:    $result = $this->xpath('//div[contains(@class, "views-col") and contains(@class, :columns) and starts-with(@style, :width)]', [':columns' => 'col-' . $columns, ':width' => 'width: ' . $width]);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridTest.php:    $result = $this->xpath('//div[contains(@class, "views-col") and contains(@class, :columns)]', [':columns' => 'col-' . ($columns + 1)]);
    core/modules/views/tests/src/Kernel/Plugin/StyleGridTest.php:    $result = $this->xpath('//div[contains(@class, "views-col") and contains(@class, "name-John")]');
    core/modules/views/tests/src/Kernel/Plugin/StyleGridTest.php:    $result = $this->xpath('//div[contains(@class, "views-row") and contains(@class, "age-25")]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $xpath = $this->xpath('//div[@class="views-element-container"]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $xpath = $this->xpath('//div[@class="views-row"]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $xpath = $this->xpath('//div[@class="views-row"]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $xpath = $this->xpath('//div[@class="views-element-container"]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $xpath = $this->xpath('//div[@class="views-row"]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $xpath = $this->xpath('//div[@class="views-row"]');
    core/modules/views/tests/src/Kernel/ViewElementTest.php:    $this->assertCount(1, $this->xpath('//form[@class="views-exposed-form"]'));
    core/tests/Drupal/KernelTests/Core/Form/ExternalFormUrlTest.php:    $elements = $this->xpath('//form/@action');
    core/tests/Drupal/KernelTests/Core/Form/ExternalFormUrlTest.php:    $elements = $this->xpath('//form/@action');
    core/tests/Drupal/KernelTests/Core/Form/FormActionXssTest.php:    $elements = $this->xpath('//form');
    core/tests/Drupal/KernelTests/Core/Render/Element/TableTest.php:    $tableheader = $this->xpath("//script[contains(@src, 'tableheader.js')]");
    core/tests/Drupal/KernelTests/Core/Render/Element/TableTest.php:    $tableheader = $this->xpath("//script[contains(@src, 'tableheader.js')]");
    

    In other words, it likely makes sense to remove the support to try to parse XML, since all found uses are about HTML.
    I see test results pass after the change, so you were probably right!

    I guess #94 is supposed to be the patch to commit. Can that be re-uploaded so it is the last patch on the issue otherwise it is very very confusing.

    Yes, I guess we have not been hiding files enough in this issue to make that clear.
    I have just opened a new merge request with two commits, the first apply the patch in 94, and the second removes support for XML on the parse() method.
    The interdiff would be the new commit, so let me point to that directly instead of uploading one here .

  • Status changed to RTBC about 1 year ago
  • 🇺🇸United States smustgrave

    Only other test I see that has loadXML is HelpTopicsSyntaxTest which was broken off to a follow up. So remarking this one.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Committed and pushed 13883cc3808 to 11.x and ca47a29ef9f to 10.2.x. Thanks!

    Backported to 10.2.x as this is making only test changes.

    • alexpott committed 13883cc3 on 11.x
      Issue #2441373 by twistor, vsujeetkumar, marvil07, eporama, daffie,...
  • Status changed to Fixed about 1 year ago
    • alexpott committed ca47a29e on 10.2.x
      Issue #2441373 by twistor, vsujeetkumar, marvil07, eporama, daffie,...
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024