Expose raw pasted HTML for debugging Regex filters

Created on 8 July 2025, about 20 hours ago

Problem/Motivation

When writing regular expressions for the ckeditor5_paste_filter module, it's difficult to predict how pasted content is being processed by CKEditor 5. This is especially true for complex input like styled HTML from external sources (e.g., Word, Google Docs, RTF, etc), where line breaks and inline styles are often transformed in ways that aren’t visible in the final rendered content inside CKEditor, which also applies its own HTML source formatting when you switch to "Source" mode.

It would be extremely helpful if the module exposed an optional debugging feature to log or output the raw HTML CKEditor receives on paste—before any transformations or filtering are applied. This would allow site builders and developers to write more precise regex rules by seeing exactly what the module is working with.

In my specific case, I'm pasting rich text from an RTF file with line breaks that are being converted to <p><br>&nbsp;</p> but I'm having trouble writing a regular expression to match this. I wrote 4 or 5 valid regexes, but none seem to match, because I'm just guessing at what the original HTML might be. I need a way to see the raw HTML string this CKEditor plugin receives upon paste, in order to write my Regex to match correctly.

Steps to reproduce

1. Launch Simplytest.me with CKEditor 5 Paste Filter module with the standard Drupal install profile.
2. Edit the Full HTML format (/admin/config/content/formats/manage/full_html).
4. Under CKEditor 5 plugin settings select the Paste filter vertical tab.
5. Enable the plugin by checking the Filter pasted content checkbox.
6. Customize the filter to try one of the following regular expressions (detailed below).
7. Save the text format: Scroll to the bottom and click Save configuration.
8. Add a new node using the configured text format (/node/add).
9. Paste the rich content into the editor.

Search expression: <p>\W*<br>\W*&nbsp;\W*<\/p>
Replacement: [leave empty]

Search expression: <p>\W*<br>\W*<\/p>
Replacement: [leave empty]

Search expression: \n+
Replacement: <br>

Search expression: (<br>)+
Replacement: <br>

Search expression: (<p><br></p>)+
Replacement: (<p><br></p>)+

Search expression: (<p><br></p>)+
Replacement: (<p><br></p>)+

Markup samples

RTF file containing the following snippet with three blank lines:

Sample text, now in italics, now with italics and bold, .



Markup result (pasting without filtering)

Note: 3 empty paragraphs with both <br> and &nbsp; introduced.

<p>
    Sample text, <em>now in italics</em>, <em><strong>now with italics and bold</strong></em>, <em><strong>now with italics and bold and underlined</strong></em>.
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    Just some underlined text.
</p>

Markup result (pasting with filtering)

Note: 2 empty paragraphs with both <br> and &nbsp; introduced.

<p>
    Sample text, <em>now in italics</em>, <em><strong>now with italics and bold</strong></em>, <em><strong>now with italics and bold and underlined</strong></em>.
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    Just some underlined text.
</p>

Expected markup result

Ideally there would be no empty paragraphs.

<p>
    Sample text, <em>now in italics</em>, <em><strong>now with italics and bold</strong></em>, <em><strong>now with italics and bold and underlined</strong></em>.
</p>
<p>
    Just some underlined text.
</p>

Proposed resolution

  • Add a developer/debug mode setting (e.g. in config or via a flag) that logs the raw text/html clipboard data during the paste process.
  • Output could go to the JS console via console.info. Possibly, tie into CKEditor's clipboardInput event to intercept the raw HTML string.

This would greatly improve the developer experience when building and testing paste filters.

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.1

Component

Code

Created by

🇪🇨Ecuador jwilson3

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @jwilson3
  • 🇪🇨Ecuador jwilson3

    Here is a partial config export for what I've tested:

        ckeditor5_paste_filter_pasteFilter:
          enabled: true
          filters:
            -
              enabled: true
              weight: -18
              search: '<o:p><\/o:p>'
              replace: ''
            -
              enabled: true
              weight: -17
              search: '(<[^>]*) (style="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -16
              search: '(<[^>]*) (face="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -15
              search: '(<[^>]*) (class="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -14
              search: '(<[^>]*) (valign="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -12
              search: '<font[^>]*>'
              replace: ''
            -
              enabled: true
              weight: -11
              search: '<\/font>'
              replace: ''
            -
              enabled: true
              weight: -10
              search: '<span[^>]*>'
              replace: ''
            -
              enabled: true
              weight: -9
              search: '<\/span>'
              replace: ''
            -
              enabled: true
              weight: -8
              search: '<p>&nbsp;<\/p>'
              replace: ''
            -
              enabled: true
              weight: -7
              search: '<p><\/p>'
              replace: ''
            -
              enabled: true
              weight: -6
              search: '<b><\/b>'
              replace: ''
            -
              enabled: true
              weight: -5
              search: '<i><\/i>'
              replace: ''
            -
              enabled: true
              weight: -4
              search: '<a name="OLE_LINK[^"]*">(.*?)<\/a>'
              replace: $1
            -
              enabled: true
              weight: -3
              search: '<p>\W*<br>\W*&nbsp;\W*<\/p>'
              replace: ''
            -
              enabled: true
              weight: -2
              search: '<p>\W*<br>\W*<\/p>'
              replace: ''
            -
              enabled: true
              weight: -13
              search: '(<[^>]*) (dir="ltr")'
              replace: $1
            -
              enabled: true
              weight: -1
              search: (<br>)+
              replace: '<br>'
            -
              enabled: true
              weight: 0
              search: (<p><br></p>)+
              replace: '<p><br></p>'
            -
              enabled: true
              weight: 1
              search: '<br>\n'
              replace: '<br>'
            -
              enabled: true
              weight: 2
              search: \n+
              replace: '<br>'
    
  • 🇪🇨Ecuador jwilson3

    Here is the RTF file I used for testing.

Production build 0.71.5 2024