Difficulty filtering empty paragraphs

Created on 22 November 2023, 7 months ago
Updated 12 December 2023, 6 months ago

Problem/Motivation

I am having difficulty in filtering empty paragraphs using extra custom filters.

Steps to reproduce

1. Install the CKEditor 5 Paste Filter module
2. Configure a text format to enable 'Filter pasted content'
3. Added some extra filters as I want to filter out all tags other than paragraphs in text copied from Word.
4. Save the text format.

Example of extra filters:

Search expression: <ul>
Replacement: 
Search expression: <\/ul>
Replacement: 
Search expression: <li>
Replacement: <p>
Search expression: <\/li>
Replacement: </p>

If I paste text that would be like this without the CKEditor 5 Paste Filter module:

<ul>
    <li>
        <p class="western" align="left">
            Praesent sed turpis diam.
        </p>
    </li>
    <li>
        <p class="western" align="left">
            Mauris eget tellus vel mi aliquet feugiat.
        </p>
    </li>
</ul>

With my custom filters I get:

<p>
    &nbsp;
</p>
<p>
    Praesent sed turpis diam.
</p>
<p>
    &nbsp;
</p>
<p>
    &nbsp;
</p>
<p>
    Mauris eget tellus vel mi aliquet feugiat.
</p>
<p>
    &nbsp;
</p>

Can i remove these:

<p>
    &nbsp;
</p>

with some extra filters?

Proposed resolution

(added by maintainer to help folks finding this issue in future)

When removing unwanted elements, replace with the empty string rather than replacing with <p> tags.

CKEditor 5 has markup conversion processes and will add any missing <p> tags so replacing unwanted elements with paragraphs will lead to empty paragraphs.

πŸ’¬ Support request
Status

Fixed

Version

1.0

Component

Miscellaneous

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Issue created by @annoul4
  • Status changed to Postponed: needs info 7 months ago
  • πŸ‡¨πŸ‡¦Canada Cottser

    I’m going to need more information. Please fill out the issue summary template so I can try to help you! You can see it when you create a new issue.

    One possibility to keep in mind is that CKEditor 5 might be inserting these empty paragraphs after pasting.

  • I have edited the original question. I hope it makes sense now.

  • πŸ‡ΊπŸ‡ΈUnited States michael.acevedo@pomona.edu

    I'm also getting lots of empty paragraphs when I paste in text from Word. Here's what I have in my paste filter:
    I can provide the Word doc I'm using if needed.

        ckeditor5_paste_filter_pasteFilter:
          enabled: true
          filters:
            -
              enabled: true
              weight: -4
              search: '<o:p><\/o:p>'
              replace: ''
            -
              enabled: true
              weight: -54
              search: '(<[^>]*) (style="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -53
              search: '(<[^>]*) (face="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -52
              search: '(<[^>]*) (class="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -51
              search: '(<[^>]*) (valign="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -50
              search: '<font[^>]*>'
              replace: ''
            -
              enabled: true
              weight: -49
              search: '<\/font>'
              replace: ''
            -
              enabled: true
              weight: -48
              search: '<span[^>]*>'
              replace: ''
            -
              enabled: true
              weight: -47
              search: '<\/span>'
              replace: ''
            -
              enabled: true
              weight: -3
              search: '<p>&nbsp;<\/p>'
              replace: ''
            -
              enabled: true
              weight: -2
              search: '<p><\/p>'
              replace: ''
            -
              enabled: true
              weight: -46
              search: '<b><\/b>'
              replace: ''
            -
              enabled: true
              weight: -45
              search: '<i><\/i>'
              replace: ''
            -
              enabled: true
              weight: -44
              search: '<a name="OLE_LINK[^"]*">(.*?)<\/a>'
              replace: $1
            -
              enabled: true
              weight: -43
              search: '<(img|address|article|aside|audio|blockquote|button|canvas|caption|center|cite|clippath|code|col|colgroup|defs|details|div|embed|fieldset|figcaption|figure|footer|form|g|head|html|iframe|input|label|legend|main|nav|rect|script|section|source|style|summary|svg|title|mj-raw|video)[^>]*>'
              replace: ''
            -
              enabled: true
              weight: -42
              search: '<(\/address|\/article|\/aside|\/audio|\/blockquote|\/button|\/canvas|\/caption|\/center|\/cite|\/clippath|\/code|\/col|\/colgroup|\/defs|\/details|\/div|\/embed|\/fieldset|\/figcaption|\/figure|\/footer|\/form|\/g|\/head|\/html|\/iframe|\/input|\/label|\/legend|\/main|\/nav|\/rect|\/script|\/section|\/source|\/style|\/summary|\/svg|\/title|\/mj-raw|\/video)[^>]*>'
              replace: ''
            -
              enabled: true
              weight: -41
              search: '<td[^>]*>'
              replace: '<td>'
            -
              enabled: true
              weight: -40
              search: '<tr[^>]*>'
              replace: '<tr>'
            -
              enabled: true
              weight: -39
              search: '<o:p>'
              replace: ''
            -
              enabled: true
              weight: -38
              search: '<\/o:p>'
              replace: ''
            -
              enabled: true
              weight: -37
              search: '(<[^>]*) (align="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -36
              search: '<h2[^>]*>'
              replace: '<h2>'
            -
              enabled: true
              weight: -35
              search: '<h3[^>]*>'
              replace: '<h3>'
            -
              enabled: true
              weight: -34
              search: '<h4[^>]*>'
              replace: '<h4>'
            -
              enabled: true
              weight: -33
              search: '<h5[^>]*>'
              replace: '<h5>'
            -
              enabled: true
              weight: -32
              search: '<h6[^>]*>'
              replace: '<h6>'
            -
              enabled: true
              weight: -31
              search: '<li[^>]*>'
              replace: '<li>'
            -
              enabled: true
              weight: -30
              search: '<ul[^>]*>'
              replace: '<ul>'
            -
              enabled: true
              weight: -29
              search: '<ol[^>]*>'
              replace: '<ol>'
            -
              enabled: true
              weight: -28
              search: '<strong[^>]*>'
              replace: '<strong>'
            -
              enabled: true
              weight: -27
              search: '(<[^>]*) (start="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -26
              search: '(<[^>]*) (type="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -25
              search: '(<[^>]*) (id="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -24
              search: '(<[^>]*) (name="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -23
              search: '(<[^>]*) (shash="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -22
              search: '(<[^>]*) (title="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -21
              search: '(<[^>]*) (referrerpolicy="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -20
              search: '(<[^>]*) (rel="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -19
              search: '<h1[^>]*>'
              replace: '<p>'
            -
              enabled: true
              weight: -18
              search: '<\/h1>'
              replace: '</p>'
            -
              enabled: true
              weight: -17
              search: '(<[^>]*) (role="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -16
              search: '<th[^>]*>'
              replace: '<th>'
            -
              enabled: true
              weight: -15
              search: '<p[^>]*>'
              replace: '<p>'
            -
              enabled: true
              weight: -14
              search: '(<[^>]*) (bgcolor="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -13
              search: '(<[^>]*) (data-(\S+)="((?:\\.|[^"\\])*)")'
              replace: $1
            -
              enabled: true
              weight: -12
              search: '<em[^>]*>'
              replace: '<em>'
            -
              enabled: true
              weight: -11
              search: '(<[^>]*) (aria-(\S+)="((?:\\.|[^"\\])*)")'
              replace: $1
            -
              enabled: true
              weight: -10
              search: '(<[^>]*) (tabindex="[^"]*")'
              replace: $1
            -
              enabled: true
              weight: -9
              search: '<table[^>]*>'
              replace: '<table>'
            -
              enabled: true
              weight: -8
              search: '<b>'
              replace: '<strong>'
            -
              enabled: true
              weight: -7
              search: '<\/b>'
              replace: '</strong>'
            -
              enabled: true
              weight: -6
              search: '<i>'
              replace: '<em>'
            -
              enabled: true
              weight: -5
              search: '<\/i>'
              replace: '</em>'
  • Status changed to Active 7 months ago
  • πŸ‡¨πŸ‡¦Canada Cottser

    @annoul4 I have an idea what is happening, but would like to confirm before trying to explain fully. Overall I believe the filtering is working as expected. I suspect CKEditor 5 is transforming your filtered markup in a way that you are not expecting.

    Overall I suspect you will both want something different from this module for removing empty paragraphs. Perhaps https://www.drupal.org/project/emptyparagraphkiller β†’ is worth considering, or something similar could be implemented as a custom CKEditor 5 plugin.

  • Status changed to Fixed 6 months ago
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.69.0 2024