Lists are not being formatted properly

Created on 24 August 2023, about 1 year ago
Updated 1 July 2024, 5 months ago

Using this filter and attached Word file it is adding in text-indent:-.25in; to all bullet or numbered lists and I cannot figure out why or how to filter it out. This throws off all lists. Any help is appreciated. Here is an example of view source after paste:

<ul style="list-style-type:disc;">
    <li style="mso-list:l0 level1 lfo1;text-indent:-.25in;">
        Item 1
    </li>
    <li style="mso-list:l0 level1 lfo1;text-indent:-.25in;">
        Item 2
        <br>
        more about item 2
    </li>
    <li style="mso-list:l0 level1 lfo1;text-indent:-.25in;">
        Item 3
    </li>
</ul>
<p>
    End of bullet list.
</p>
<p>
    This is a numbered list:
</p>
<ol>
    <li style="mso-list:l1 level1 lfo2;text-indent:-.25in;">
        Item 1
    </li>
    <li style="mso-list:l1 level1 lfo2;text-indent:-.25in;">
        Item 2
        <br>
        more about Item 2
    </li>

Any insight or configuration changes would be appreciated.

We have tested cut paste in 3 browsers and the result can be found here - https://demo9.schoolboard.net/node/3403

SBN Toolbar has style filter unchecked.
Full HTML has html filters unchecked and style filter in paste unchecked.

๐Ÿ’ฌ Support request
Status

Fixed

Version

1.0

Component

Miscellaneous

Created by

๐Ÿ‡บ๐Ÿ‡ธUnited States markfien

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @markfien
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States markfien
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada star-szr

    Can you provide a screenshot and/or config export of your settings for this module in the text format you are testing?

    From what I can see this is not using the default settings, since all style attributes should be filtered out if youโ€™re using the default filter set.

  • Status changed to Postponed: needs info about 1 year ago
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada star-szr

    Please test again using the default filter settings. In the example youโ€™ve given I donโ€™t see any styles that need preserving, so if you remove all style attributes that should cleanly solve your issue.

    If you want to selectively remove styles from the style attribute using this module, itโ€™s possible but will require a more specific search expression than is provided with the default filter set.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States markfien

    Thank you @star-szr. Attached is scrolled screenshot of settings. I did get somewhat better results. However, when I added color, italics and bold to document (also attached as zip) those are lost in conversion with the exception of bold which came across.

    Examples of paste into different browsers using M1 Max Macbook Pro, Ventura 13.5.1 can be found at https://demo9.schoolboard.net/node/3404 for reference.

    One other question, which I have not tested is under enabled filters should 'convert line breaks into HTML' and 'correct faulty and chopped off HTML' be checked or unchecked.

    Thank you for the input and help.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States markfien

    I've done a 2nd test of a more complex Word file used by clients. The zip of the Word file (Agenda - October 17, 2022) is attached and the resulting paste into Safari is here: https://demo9.schoolboard.net/node/3405

    There are many tables and lists - you can see the resulting numbering errors and table loss of formatting in outlines.

    Hope this helps.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada star-szr

    To get back to your original post/question, you certainly could set up a custom paste filter to remove all text-indent styles, but based on what you are sharing that would only be the tip of the iceberg in terms of what you are trying to achieve.

    In the bigger picture you may want to take a few steps back and consider other solutions/workflows for getting this content into Drupal. What you are trying to achieve is not simple or easy. If this is really important to get right for your project, then one solution you may want to consider is the paid CKEditor 5 plugin that allows you to import Word documents: https://ckeditor.com/import-from-word/demo/

    I'm only mentioning this as an option to consider, I have not used this plugin myself other than on the demo page, and have no connection with CKEditor 5 or CKSource other than I have written some code that may get incorporated into the CKEditor 5 codebase (currently in a pull request on GitHub).

    To get an idea of what we are looking at, let's take the document from your comment #5, if you paste that into CKEditor 5 without this module enabled, you will get something similar to the following.

    <p class="MsoNormal">
        <a name="OLE_LINK19"><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">Test of Word cut/paste</span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;mso-spacerun:yes;" lang="EN-US">&nbsp; </span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">(color Red, italic)</span></a><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Content from Word should be converted properly, and these two paragraphs use shift-enter for the spacing which should be one line</span>
        <br>
        <br>
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This second paragraph should only be one line below the first and uses a paragraph break for the next section.</span><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This third section is separated by 2 paragraph breaks.</span><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US"><strong>This is a bullet list: (bold)</strong></span><o:p></o:p>
    </p>
    <ul>
        <li class="MsoListParagraphCxSpFirst" style="mso-list:l0 level1 lfo1;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 1</span><o:p></o:p>
        </li>
        <li class="MsoListParagraphCxSpMiddle" style="mso-list:l0 level1 lfo1;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 2</span>
            <br>
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">more about item 2</span><o:p></o:p>
        </li>
        <li class="MsoListParagraphCxSpLast" style="mso-list:l0 level1 lfo1;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 3</span><o:p></o:p>
        </li>
    </ul>
    <p class="MsoNormal">
        <span style="color:#00B050;mso-bookmark:OLE_LINK1;" lang="EN-US">End of bullet list.(green)</span><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US"><strong>This is a numbered list: (bold)</strong></span><o:p></o:p>
    </p>
    <ol>
        <li class="MsoListParagraphCxSpFirst" style="mso-list:l1 level1 lfo2;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 1</span><o:p></o:p>
        </li>
        <li class="MsoListParagraphCxSpMiddle" style="mso-list:l1 level1 lfo2;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 2</span>
            <br>
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">more about Item 2</span><o:p></o:p>
        </li>
        <li class="MsoListParagraphCxSpLast" style="mso-list:l1 level1 lfo2;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 3</span><o:p></o:p>
        </li>
    </ol>
    <p class="MsoNormal">
        <span style="color:#0070C0;mso-bookmark:OLE_LINK1;" lang="EN-US">End of numbered list (blue)</span><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This is an alphabetic list:</span><o:p></o:p>
    </p>
    <ol style="list-style-type:lower-alpha;">
        <li class="MsoListParagraphCxSpFirst" style="mso-list:l2 level1 lfo3;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item a</span><o:p></o:p>
        </li>
        <li class="MsoListParagraphCxSpMiddle" style="mso-list:l2 level1 lfo3;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item b</span>
            <br>
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">more about item b</span><o:p></o:p>
        </li>
        <li class="MsoListParagraphCxSpLast" style="mso-list:l2 level1 lfo3;text-indent:-18.0pt;">
            <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item c</span><o:p></o:p>
        </li>
    </ol>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">End of alphabetic list</span><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This is the end of the test document.</span><o:p></o:p>
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    <p class="MsoNormal">
        &nbsp;
    </p>
    

    This is a mess as you can see, and if you enable this module and use the default settings, you will instead get this:

    <p>
        Test of Word cut/paste&nbsp; (color Red, italic)
    </p>
    <p>
        Content from Word should be converted properly, and these two paragraphs use shift-enter for the spacing which should be one line
        <br>
        <br>
        This second paragraph should only be one line below the first and uses a paragraph break for the next section.
    </p>
    <p>
        This third section is separated by 2 paragraph breaks.
    </p>
    <p>
        <strong>This is a bullet list: (bold)</strong>
    </p>
    <ul>
        <li>
            Item 1
        </li>
        <li>
            Item 2
            <br>
            more about item 2
        </li>
        <li>
            Item 3
        </li>
    </ul>
    <p>
        End of bullet list.(green)
    </p>
    <p>
        <strong>This is a numbered list: (bold)</strong>
    </p>
    <ol>
        <li>
            Item 1
        </li>
        <li>
            Item 2
            <br>
            more about Item 2
        </li>
        <li>
            Item 3
        </li>
    </ol>
    <p>
        End of numbered list (blue)
    </p>
    <p>
        This is an alphabetic list:
    </p>
    <ol>
        <li>
            Item a
        </li>
        <li>
            Item b
            <br>
            more about item b
        </li>
        <li>
            Item c
        </li>
    </ol>
    <p>
        End of alphabetic list
    </p>
    <p>
        This is the end of the test document.
    </p>
    

    Ah, much better. You are then free to layer on whatever styles you would like, on top of the now-cleaned markup. I think this is what most people will end up doing.

    Looking for example at the first line with red and italic, the HTML that Word generates makes it very hard to preserve only the color and italics in a reasonable way. I think the closest we could realistically get with the tools provided by this module would be something like the following, and this would require a decent amount of tinkering with regular expressions and may still be fragile or inconsistent:

    Before:

    <p class="MsoNormal">
        <a name="OLE_LINK2"><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">Test of Word cut/paste</span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;mso-spacerun:yes;" lang="EN-US">&nbsp;</span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">(color Red, italic)</span></a><o:p></o:p>
    </p>
    

    After:

    <p>
        <em style="color:red;">Test of Word cut/paste</em> <em style="color:red;">(color Red, italic)</em>
    </p>
    

    I hope that's helpful.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada star-szr

    Also you could consider taking a look at the CKEditor 5 Paste from Office plugin, and evaluate whether you could create your own similar plugin with your own logic for what you want to remove.

    This is not likely to be a small undertaking, but if you need a sharper tool than this module provides then it may be a path worth considering.

  • Status changed to Fixed 12 months ago
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada star-szr

    Marking as fixed as I believe I have addressed the support request and there have been no further questions or follow-up. Thanks!

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed 5 months ago
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada star-szr
Production build 0.71.5 2024