[upstream] CKEditor 5 Restructures And Removes Inline HTML Tags

Created on 20 March 2024, 8 months ago
Updated 23 August 2024, 3 months ago

Problem/Motivation

CKEditor 5 changes the HTML Structure almost immediately. This doesn't affect the pre-existing HTML structure of different pages until and unless we open those respective nodes in /edit mode.

Steps to reproduce

  1. Setup a D10 Site.
  2. Enable CKEDitor 5
  3. Configure any text format to use "CKEditor 5" as the text editor in /admin/config/content/formats
  4. Input the following in "Source" -
    <div class="social-media">
    <span>Share</span> 
    <span class="icon">
    <a href="#" target="_blank" rel="noopener">
    <em class="fa-fw fa-twitter fab">&nbsp;</em>
    </a>
    </span>
    </div>
  5. The structure gets changed into -
    <div class="social-media">
    <span>Share</span>&nbsp;
    <a href="#" target="_blank" rel="noopener">
    <em class="fa-fw fa-twitter fab">
    <span class="icon">&nbsp;</span>
    </em>
    </a>
    </div>

Proposed resolution

Make sure that the HTML Structure doesn't get changed.

🐛 Bug report
Status

Needs work

Version

11.0 🔥

Component
CKEditor 5 

Last updated 2 days ago

Created by

🇮🇳India ighosh

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

  • Issue created by @ighosh
  • Status changed to Postponed: needs info 8 months ago
  • 🇮🇳India ighosh

    Hi Wim, I am giving another example of the HTML DOM Restructuring which is not how I need it to be -
    Input -
    <div class="container"> <span class="icon"><a href="#" target="_blank"><em>SOMETHING</em></a></span></div>
    Output -
    <div class="container"><a href="#" target="_blank"><em><span class="icon">SOMETHING</span></em></a></div>
    The main issue with this kind of restructuring is that it affects all the existing nodes the moment they are opened up in "/edit" and re-saved. I am currently backtracking on what happens when I click on the "Source" button. Maybe some .js file gets called which in turn filters out and restructures the DOM(?).

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    I can see how that's disruptive. But that sure looks like some pretty questionable HTML 😅 That makes this difficult to describe and report. Can you still reproduce this without the <em> too? Try to find the smallest possible pattern, and then verify that it works with multiple tag combinations. That'd help report this upstream, and would result in a higher priority upstream.

  • 🇮🇳India ighosh

    @Wim, I removed <em> from my DOM -

    <div class="container">
    <span>
    <a href="#">Test</a>
    </span>
    </div>

    It is getting changed into -

    <div class="container">
        <a href="#"><span>Test</span></a><span>&nbsp;</span>
    </div>

    However, upon further testing, this is not the only instance where the DOM is getting changed. I am checking on a few more instances of HTML structure, where the DOM is getting changed. Will keep everything updated here.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    Thanks!

  • Status changed to Needs work 8 months ago
  • 🇮🇳India ighosh

    Tested for these cases -
    Input -
    <a href="#">&nbsp;</a>
    Output - Entire Thing Got Removed. However, if I enter <a href="#">Lorem Ipsum</a>, it works.
    Another test case -
    Input -

    <a aria-label="Lorem Ipsum" class="lorem-ipsum-class" href="#" rel="noopener" target="_blank">
      <svg fill="none" height="18" viewbox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg">
        <path d="M1 2.6554C1 1.48814 2.27454 0.768165 3.27427 1.37068L13.8017 7.71531C14.7693 8.29848 14.7693 9.70157 13.8017 10.2847L3.27427 16.6294C2.27454 17.2319 1 16.5119 1 15.3447V2.6554Z"
          stroke="#14142B" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"></path>
      </svg>
      Watch
    </a>

    Output -

    <a class="lorem-ipsum-class" href="#" aria-label="Lorem Ipsum" rel="noopener" target="_blank"><svg fill="none" height="18" viewBox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg">
        <path d="M1 2.6554C1 1.48814 2.27454 0.768165 3.27427 1.37068L13.8017 7.71531C14.7693 8.29848 14.7693 9.70157 13.8017 10.2847L3.27427 16.6294C2.27454 17.2319 1 16.5119 1 15.3447V2.6554Z" stroke="#14142B" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"></path>
      </svg></a>
    <p>
        <a class="lorem-ipsum-class" href="#" aria-label="Lorem Ipsum" rel="noopener" target="_blank">&nbsp;Watch</a>
    </p>

    Here, the anchor tag that contains the main <svg><path></path></svg> is being copied after the main anchor tag, and put inside a paragraph (<p></p>).

  • 🇦🇹Austria mvonfrie

    That is related to the HTML normalization "feature" of CKeditor 5. See https://github.com/ckeditor/ckeditor5/issues/16203 for more examples.

  • 🇮🇳India ighosh

    Regarding this issue, I found that there was no easy way to "fix" the problem. As this is not an issue in the first place. Meaning, that CKEditor 5 was altering the HTML code because the code itself was wrong (obviously). So, I updated the structure of the DOM via code using an update hook to queue all nodes where I needed my DOM processing to take place and then created a QueueWorker to process the DOM.
    Here is a gist of how the work has been done. Please note that I have targetted only those nodes using some specific paragraph components as the DOM alteration was taking place in those nodes containing some specific components.
    Update Hook -

    /**
     * Implements hook_update_N().
     *
     * CKEditor 5 Components Update.
     */
    function ckeditor_5_update_9250() {
      $node_data = \Drupal::entityTypeManager()->getStorage('node');
      $paragraph_data = \Drupal::entityTypeManager()->getStorage('paragraph');
      // Components Array.
      $components_array = [
        'lorem_ipsum_component_name',
        'lorem_ipsum_component_name_1',
        'lorem_ipsum_component_name_2',
      ];
      // Get Field Map For Entity Reference Revisions.
      $paragraph_bundles = \Drupal::service('entity_field.manager')->getFieldMapByFieldType('entity_reference_revisions');
      $nodes = [];
      if ($paragraph_bundles) {
        foreach ($paragraph_bundles as $index => $paragraph_field) {
          // Check If The Bundle Is For Nodes.
          if ($index == 'node') {
            foreach ($paragraph_field as $field_name => $field_info) {
              $paragraph_field_load = FieldStorageConfig::loadByName('node', $field_name);
              // Check If The Field's Target type Is Paragraph.
              if ($paragraph_field_load->getSettings()['target_type'] == 'paragraph') {
                foreach ($components_array as $component_name) {
                  $paragraph_load = $paragraph_data->loadByProperties(['type' => $component_name]);
                  foreach ($paragraph_load as $paragraph_id => $paragraph) {
                    // Check If Nodes Use The Components.
                    if (count($node_data->loadByProperties([$field_name => $paragraph_id]))) {
                      $paragraph_bundle = $paragraph->bundle();
                      $nodes[$paragraph_bundle][] = $node_data->loadByProperties([$field_name => $paragraph_id]);
                    }
                  }
                }
              }
            }
          }
        }
      }
      // If There Are Nodes Associated With Components.
      if ($nodes) {
        // Array To Store Nodes' Group With More Than One Element In A Separate
        // Index.
        $excess_nodes = [];
        foreach ($nodes as $component => $nodes_group) {
          foreach ($nodes_group as $node_group) {
            // Check If The Array Group Has More Than One Element.
            if (count($node_group) > 1) {
              foreach ($node_group as $node) {
                $excess_nodes[$component][] = [$node];
              }
            }
            else {
              $excess_nodes[$component][] = $nodes_group;
            }
          }
          $nodes = $excess_nodes;
        }
        // Remove Duplicate Nodes, And Store Unique Nodes In A Separate Array.
        $unique_nodes = [];
        foreach ($nodes as $component => $unique_node_group) {
          foreach ($unique_node_group as $node) {
            $unique_nodes[$component] = array_values(array_map('unserialize', array_unique(array_map('serialize', $node))));
          }
        }
        /** @var \Drupal\Core\Queue\QueueInterface $queue */
        $queue = \Drupal::service('queue')->get('ckeditor5_components_update');
        foreach ($unique_nodes as $component => $node_group) {
          foreach ($node_group as $node) {
            $item = new \stdClass();
            $item->nodes = $node;
            $item->components = $component;
            $queue->createItem($item);
          }
        }
      }
    }
    

    QueueWorker -

    public function processItem($data) {
        $nodes = (array) $data->nodes;
        $node = reset($nodes);
        $node_id = $node->id();
        $referenced_entities = $node->referencedEntities();
        foreach ($referenced_entities as $field) {
          if ($field instanceof Paragraph) {
            $fields = [
              'field_html_section',
              'field_html',
            ];
            $paragraph_id = $field->bundle();
            foreach ($fields as $main_html_field) {
              if ($field->hasField($main_html_field) && $field->get($main_html_field)->value) {
                $html_value = $field->get($main_html_field)->value;
                $html_value = mb_convert_encoding($html_value, 'HTML-ENTITIES', 'UTF-8');
                $dom = new \DOMDocument();
                $dom->loadHTML($html_value, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
             }
           }
         }
       }
      }
    

    Then, in the QueueWorker, used switch-case to target each paragraph and its corresponding DOM processing.
    Hope this helps someone :)

  • 🇦🇹Austria mvonfrie

    Why is your first example <a href="#">&nbsp;</a> (obviously) wrong? Syntactically it is totally correct, but of course semantically this doesn't make sense because the user will never be able to click this link. If this for some reason is used as a trap link (kind of honeypot) with a special url, you would know that the link has been "clicked"/followed by a robot and not a human, then it makes sense again.

    Would be interesting what CKeditor5 does with this? <a name="top">&nbsp;</a> This is an invisible anchor which can be used as jump target (a "Top" button at the end of the page or floating at the bottom to jump back to the start of the page (after header, banner image etc.).

    In my opinion, CKeditor5 should correct syntactically wrong markup but not interpret syntactically correct markup which maybe makes no sense, as it cannot know a developer's intentions.

  • 🇺🇸United States skowyra Boston

    We've also been running into this behavior; html tags and classes get stripped out in CKEditor 5. I can see where normalization could be the culprit, but in our case, we have a clunky work-around when the behavior occurs. If resaving doesn't work after several attempts, we copy the content (Source), paste into a text editor, add the new class or html there, copy the updated content back. That usually works.

    The fact that we can eventually save it indicates that normalization would not be the root cause. Let me add, this behavior occurs in nodes and webforms, plus we use Site Studio where it occurs in our components.

    This started happening when we upgraded to CKEditor 5. We're currently on Drupal 10.2.4, but will be going up to 10.3 very soon.

  • 🇺🇸United States lhridley

    Also encountered this issue on a site that was recently upgraded from CKEditor4 to CKEditor5. Edited a footer block that was created with CKEditor4 to make a minor text edit. The block also contained fontawesome social media icons, which were not affected by the text edits made.

    Saving the footer resulted in the FontAwesome social media icons getting wrapped in <em></em> tags.

    Original content:

    <div class="column medium-3"><img class="footer-logo" src="/themes/custom/usap_base/source/images/usap/svg/footer-logo-b.svg" alt="USAP Logo" width="201" height="57"></div><div class="column medium-3"><p><br>&nbsp;</p></div><div class="column medium-2"><div id="block-footersocial"><ul class="social"><li><a class="offsite" href="http://facebook.com/usanesthesiapartners" target="_blank"><i class="fa fa-facebook"><span class="visually-hidden">Facebook</span>&nbsp;</i></a></li><li><a class="offsite" href="http://twitter.com/USAP_Updates" target="_blank"><i class="fa fa-twitter"><span class="visually-hidden">Twitter</span>&nbsp;</i></a></li><li><a class="offsite" href="http://linkedin.com/company/us-anesthesia-partners/" target="_blank"><i class="fa fa-linkedin"><span class="visually-hidden">Linkedin</span>&nbsp;</i></a></li><li><a class="offsite" href="http://instagram.com/usanesthesiapartners" target="_blank"><i class="fa fa-instagram"><span class="visually-hidden">Instagram</span>&nbsp;</i></a></li></ul></div></div><div class="column small-3"><p class="copyright">?2021&nbsp;U.S. Anesthesia Partners. All rights reserved.</p><p><a href="/terms-and-conditions">Terms &amp; Conditions</a></p></div>
    

    Changed to:

    <div class="column medium-3"><img class="footer-logo" src="/themes/custom/usap_base/source/images/usap/svg/footer-logo-b.svg" alt="USAP Logo" width="201" height="57"></div><div class="column medium-3"><p><br>&nbsp;</p></div><div class="column medium-2"><div id="block-footersocial"><ul class="social"><li><a class="offsite" href="http://facebook.com/usanesthesiapartners" target="_blank"><em><i class="fa fa-facebook"><span class="visually-hidden">Facebook</span>&nbsp;</i></em></a></li><li><a class="offsite" href="http://twitter.com/USAP_Updates" target="_blank"><em><i class="fa fa-twitter"><span class="visually-hidden">Twitter</span>&nbsp;</i></em></a></li><li><a class="offsite" href="http://linkedin.com/company/us-anesthesia-partners/" target="_blank"><em><i class="fa fa-linkedin"><span class="visually-hidden">Linkedin</span>&nbsp;</i></em></a></li><li><a class="offsite" href="http://instagram.com/usanesthesiapartners" target="_blank"><em><i class="fa fa-instagram"><span class="visually-hidden">Instagram</span>&nbsp;</i></em></a></li></ul></div></div><div class="column small-3"><p class="copyright">?2021&nbsp;U.S. Anesthesia Partners. All rights reserved.</p><p><a href="/terms-and-conditions">Terms &amp; Conditions</a></p></div>
    
Production build 0.71.5 2024