ToC Chunker may remove unrecognised HTML tags

Created on 13 November 2023, about 1 year ago
Updated 26 November 2023, 12 months ago

Problem/Motivation

It appears that using the "ToC Chunker" may interfere with content output, specifically if the content does not pass DomDocument's HTML validation.

Steps to reproduce

  1. Install and enable Sector ToC along with its dependencies
  2. Configure a content type to use "ToC Chunker" on the Body field at /admin/structure/types/manage/<type>/display
  3. Create a piece of content which uses an unrecognised HTML tag, such as <drupal-media>, <drupal-entity-embed>, or (unconfirmed) HTML5 tags such as <command>, <details> or <figure>
  4. View the content; when chunked, unrecognised tags will be stripped from the HTML source before the filters are applied

Proposed resolution

  1. Verify the reported problem, ideally by introducing test coverage
  2. Agree on a solution
  3. Implement the solution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Closed: duplicate

Version

1.0

Component

Code

Created by

πŸ‡³πŸ‡ΏNew Zealand xurizaemon Ōtepoti, Aotearoa 🏝

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @xurizaemon
  • πŸ‡³πŸ‡ΏNew Zealand xurizaemon Ōtepoti, Aotearoa 🏝
  • πŸ‡³πŸ‡ΏNew Zealand xurizaemon Ōtepoti, Aotearoa 🏝

    Reproduction notes:

    Configured the "Full HTML" format with CKEditor5, "Manually editable HTML tags" permitting additional tags ` `. It was not necessary to add ` ` as these tags were already provided by a CKEditor plugin.

    Input HTML

    The Body of the node was set to:

    Apples

    Granny Smith

    Use sudo gimme apple --type=granny-smith to retrieve a Granny Smith from /dev/urandom.

    Fuji

    Not a picture of an apple at all.

     

    Gala

    Secrets about Gala applesThese apples are cuboid if grown in space.

    Oranges

    Mandarins

    Tangelos

    Melons

    Watermelons

    Rockmelons

     

    Output when Body display is set to "ToC Chunker"

    Excerpted, the first paragraph outputs as:

    Use sudo gimme apple --type=granny-smith to retrieve a Granny Smith from /dev/urandom.

    The media in the second section is not displayed at all.

    Output when Body display is set to "Chunker"

    Use sudo gimme apple --type=granny-smith to retrieve a Granny Smith from /dev/urandom.

    This also appears wrong - the closing tag of the

    <command><code> container has moved to the end of the line?!
    
    <h3>Output when Body display is set to "Default"</h3>
    
    <pre><p>Use&nbsp;<command>sudo gimme apple --type=granny-smith to retrieve a Granny Smith from /dev/urandom.</command></p></pre>
    
    The closing tag of the <code><command><code> container has moved to the end of the line again. For this reason I am not (yet) moving the issue from Sector ToC to Chunker; the <em>removal</em> of unrecognised tags is the issue being reported here.
    
    (I expect that moving the closing &lt;/command&gt; tag is either my own mistake, some other local configuration, or perhaps a CKEditor behaviour. Most likely the first!)
    
    <h3>Analysis</h3>
    
    I see that some of the HTML modification is coming from Chunker, but also that the behaviour is worse in Sector ToC.
    
    Chunker below is processing the source HTML - so non-HTML markup such as <code><drupal-*>

    may be visible in the source, as may be HTML5-valid tags such as <command>. It may be that Chunker should run later in the process - especially if the filters might impact the heading structure of the document.

    When we're in Chunker's `\Drupal\chunker\Plugin\Field\FieldFormatter\ChunkerFormatter::viewElements()`, inside the `foreach()` on $items we have a single item, where the value is:

    Apples

    Granny Smith

    Use sudo gimme apple --type=granny-smith to retrieve a Granny Smith from /dev/urandom.

    Fuji

    Not a picture of an apple at all.

     

    Gala

    Secrets about Gala applesThese apples are cuboid if grown in space.

    Oranges

    Mandarins

    Tangelos

    Melons

    Watermelons

    Rockmelons

     

    The `#text` value of `$elements` after this foreach is set to:

    Apples

    Granny Smith

    Use sudo gimme apple --type=granny-smith to retrieve a Granny Smith from /dev/urandom.

    Fuji

    Not a picture of an apple at all.

     

    Gala

    Secrets about Gala applesThese apples are cuboid if grown in space.

    Oranges

    Mandarins

    Tangelos

    Melons

    Watermelons

    Rockmelons

     

    In debugger output we see warnings such as:

    PHP Warning: DOMDocument::loadHTML(): Tag drupal-media invalid in Entity, line: 1 in /app/web/ on line 294
    PHP Warning: DOMDocument::loadHTML(): Tag details invalid in Entity, line: 1 in /app/web/ on line 294
    PHP Warning: DOMDocument::loadHTML(): Tag summary invalid in Entity, line: 1 in /app/web/ on line 294
    

    Notes

    I did investigate whether applying the change from πŸ› Upgrade filter system to HTML5 Fixed to core modified this behaviour, but it did not seem to have any impact.

  • πŸ‡³πŸ‡ΏNew Zealand jonathan_hunt

    I was not able to reproduce this using Chunker formatter alone, as output format for basic page body. However, with Sector ToC the `` present in the body text area is not rendered. This is on a text area that is not using CKEditor5. So it's either Sector ToC or toc_api at issue...

  • πŸ‡³πŸ‡ΏNew Zealand xurizaemon Ōtepoti, Aotearoa 🏝

    Possibility this is related to πŸ› FIX for html stripped by Xss:filterAdmin() in TocBuilder::buildContent() Needs work as well, not yet investigated (thanks @jonathan_hunt).

  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    I think the debugger warnings mentioned are fine - they are coming from https://git.drupalcode.org/project/drupal/-/blob/10.1.x/core/lib/Drupal/... which is suppressed for a reason.

    I have tested using patch from πŸ› FIX for html stripped by Xss:filterAdmin() in TocBuilder::buildContent() Needs work and was not able to reproduce the issue - I strongly suspect applying the patch from 2986763 will resolve this issue. That should perhaps be added somewhere as a needed step (project page?) if confirmed.

  • Status changed to Closed: duplicate 12 months ago
  • πŸ‡³πŸ‡ΏNew Zealand xurizaemon Ōtepoti, Aotearoa 🏝

    Yep, it appears that this was a manifestation of πŸ› FIX for html stripped by Xss:filterAdmin() in TocBuilder::buildContent() Needs work , thanks Eric. Applying the patch from that issue appears to have addressed this behaviour.

Production build 0.71.5 2024