Problem/Motivation
When writing regular expressions for the ckeditor5_paste_filter module, it's difficult to predict how pasted content is being processed by CKEditor 5. This is especially true for complex input like styled HTML from external sources (e.g., Word, Google Docs, RTF, etc), where line breaks and inline styles are often transformed in ways that aren’t visible in the final rendered content inside CKEditor, which also applies its own HTML source formatting when you switch to "Source" mode.
It would be extremely helpful if the module exposed an optional debugging feature to log or output the raw HTML CKEditor receives on paste—before any transformations or filtering are applied. This would allow site builders and developers to write more precise regex rules by seeing exactly what the module is working with.
In my specific case, I'm pasting rich text from an RTF file with line breaks that are being converted to <p><br> </p>
but I'm having trouble writing a regular expression to match this. I wrote 4 or 5 valid regexes, but none seem to match, because I'm just guessing at what the original HTML might be. I need a way to see the raw HTML string this CKEditor plugin receives upon paste, in order to write my Regex to match correctly.
Steps to reproduce
1. Launch Simplytest.me with CKEditor 5 Paste Filter module with the standard Drupal install profile.
2. Edit the Full HTML format (/admin/config/content/formats/manage/full_html).
4. Under CKEditor 5 plugin settings select the Paste filter vertical tab.
5. Enable the plugin by checking the Filter pasted content checkbox.
6. Customize the filter to try one of the following regular expressions (detailed below).
7. Save the text format: Scroll to the bottom and click Save configuration.
8. Add a new node using the configured text format (/node/add).
9. Paste the rich content into the editor.
Search expression: <p>\W*<br>\W* \W*<\/p>
Replacement: [leave empty]
Search expression: <p>\W*<br>\W*<\/p>
Replacement: [leave empty]
Search expression: \n+
Replacement: <br>
Search expression: (<br>)+
Replacement: <br>
Search expression: (<p><br></p>)+
Replacement: (<p><br></p>)+
Search expression: (<p><br></p>)+
Replacement: (<p><br></p>)+
Markup samples
RTF file containing the following snippet with three blank lines:
Sample text, now in italics, now with italics and bold, .
Markup result (pasting without filtering)
Note: 3 empty paragraphs with both <br>
and
introduced.
<p>
Sample text, <em>now in italics</em>, <em><strong>now with italics and bold</strong></em>, <em><strong>now with italics and bold and underlined</strong></em>.
</p>
<p>
<br>
</p>
<p>
<br>
</p>
<p>
<br>
</p>
<p>
Just some underlined text.
</p>
Markup result (pasting with filtering)
Note: 2 empty paragraphs with both <br>
and
introduced.
<p>
Sample text, <em>now in italics</em>, <em><strong>now with italics and bold</strong></em>, <em><strong>now with italics and bold and underlined</strong></em>.
</p>
<p>
<br>
</p>
<p>
<br>
</p>
<p>
Just some underlined text.
</p>
Expected markup result
Ideally there would be no empty paragraphs.
<p>
Sample text, <em>now in italics</em>, <em><strong>now with italics and bold</strong></em>, <em><strong>now with italics and bold and underlined</strong></em>.
</p>
<p>
Just some underlined text.
</p>
Proposed resolution
- Add a developer/debug mode setting (e.g. in config or via a flag) that logs the raw text/html clipboard data during the paste process.
- Output could go to the JS console via
console.info
. Possibly, tie into CKEditor's clipboardInput event to intercept the raw HTML string.
This would greatly improve the developer experience when building and testing paste filters.
Remaining tasks
User interface changes
API changes
Data model changes