Paste filter removing <strong> and <em>

Created on 13 June 2024, 6 months ago
Updated 1 July 2024, 5 months ago

Problem/Motivation

I have come across a problem when pasting from Word Office 365 online.

The filtering is working correctly for the most part but it seems to be stripping the <strong> and <em> tags when being pasted into a Full HTML CKE5 body field.

It seems like this search expressions: (<[^>]*) (style="[^"]*") is the one that is stripping out the <strong> and <em> tags, as when testing the expressions one by one, when this one was disabled, the strong and em tags reappeared.

Steps to reproduce

1. Open or Create a Word doc in Word Microsoft 365.
2. Create text that has some bold and italic text
3. Copy this text from word
4. Paste into a Full HTML body field
5. Check source

Have you disabled any default filters, or added new custom filters?
If you are using a custom set of filters, provide the details here.
Use code tags for search and replacement strings like the following,
or include your text format configuration YAML in code tags.

Yes, I have some custom search expressions. Here is the list of customs:

Search expression: <div id="WACViewPanel_ClipboardElement" contenteditable="false" spellcheck="false" tabindex="0">
Replacement: <div>

Search expression: <p\s+(?=.*\brole="[^"]*")(?=.*\baria-level="4")(?=.*\bparaid="[^"]*")(?=.*\bparaeid="\{[^}]*\}\{[^}]*\}").*?>
Replacement: <h4>

Search expression: <p\s+paraid="[^"]*".*?>
Replacement: <p>

For support requests and bug reports about pasting content, please provide brief markup samples.
Otherwise, delete this "Markup samples" section from here until "End of Markup samples section".

Feel free to provide multiple samples in each section if that helps illustrate your issue.

Markup samples

Markup result (pasting without filtering)

<p class="Paragraph SCXW106221821 BCX2" style="background-color:transparent;color:windowtext;font-style:normal;font-weight:normal;padding-left:0px;padding-right:0px;text-align:left;text-indent:0px;vertical-align:baseline;" paraid="345522415" paraeid="{c040a987-1294-4219-82fd-ca3c005fdccae}{10}">
                        <span class="TextRun SCXW106221821 BCX2 NormalTextRun" style="font-family:Calibri, &quot;Calibri_EmbeddedFont&quot;, &quot;Calibri_MSFontService&quot;, sans-serif;font-size:11pt;line-height:16.1875px;" data-contrast="auto" xml:lang="EN-US" lang="EN-US"><strong>This should be bold copy —</strong> </span><em><span class="TextRun SCXW106221821 BCX2 NormalTextRun" style="font-family:Calibri, &quot;Calibri_EmbeddedFont&quot;, &quot;Calibri_MSFontService&quot;, sans-serif;font-size:11pt;line-height:16.1875px;" data-contrast="auto" xml:lang="EN-US" lang="EN-US">This should be Italics</span></em>
                    </p>

Markup result (pasting with filtering)

<div>
    <div>
        <p>
            This should be bold copy — This should be Italics&nbsp;
        </p>
    </div>
</div>

Expected markup result

<div>
    <div>
        <p>
           <strong> This should be bold copy</strong> — <em>This should be Italics </em>
        </p>
    </div>
</div>
🐛 Bug report
Status

Closed: works as designed

Version

1.0

Component

Miscellaneous

Created by

🇺🇸United States ThanksNeco

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @ThanksNeco
  • Status changed to Postponed: needs info 6 months ago
  • 🇨🇦Canada star-szr

    Thanks for the report and for using the issue template.

    I can’t easily test this at the moment but I suspect it’s essentially the same issue as 🐛 Pasting from Google Docs doesn't preserve some formatting Closed: won't fix , please see my first comment there.

    The short version is that when you copy the rich text, strong and em are represented as span tags with inline styles. If you don’t remove inline styles ((<[^>]*) (style="[^"]*") expression), then CKEditor 5 converts these to strong and em based on the inline styles matching certain criteria. Since that expression and its replacement does remove inline styles, CKEditor 5 doesn’t have the data it needs to determine the tag type based on the span.

  • 🇺🇸United States ThanksNeco

    Thanks for the response, that makes total sense. This can be marked as resolved and module is working as expected. Thanks again!

  • Status changed to Closed: works as designed 6 months ago
  • 🇨🇦Canada star-szr

    Great, thanks for following up!

Production build 0.71.5 2024