- Issue created by @markfien
- ๐จ๐ฆCanada star-szr
Can you provide a screenshot and/or config export of your settings for this module in the text format you are testing?
From what I can see this is not using the default settings, since all style attributes should be filtered out if youโre using the default filter set.
- Status changed to Postponed: needs info
about 1 year ago 1:12pm 26 August 2023 - ๐จ๐ฆCanada star-szr
Please test again using the default filter settings. In the example youโve given I donโt see any styles that need preserving, so if you remove all style attributes that should cleanly solve your issue.
If you want to selectively remove styles from the style attribute using this module, itโs possible but will require a more specific search expression than is provided with the default filter set.
- ๐บ๐ธUnited States markfien
Thank you @star-szr. Attached is scrolled screenshot of settings. I did get somewhat better results. However, when I added color, italics and bold to document (also attached as zip) those are lost in conversion with the exception of bold which came across.
Examples of paste into different browsers using M1 Max Macbook Pro, Ventura 13.5.1 can be found at https://demo9.schoolboard.net/node/3404 for reference.
One other question, which I have not tested is under enabled filters should 'convert line breaks into HTML' and 'correct faulty and chopped off HTML' be checked or unchecked.
Thank you for the input and help.
- ๐บ๐ธUnited States markfien
I've done a 2nd test of a more complex Word file used by clients. The zip of the Word file (Agenda - October 17, 2022) is attached and the resulting paste into Safari is here: https://demo9.schoolboard.net/node/3405
There are many tables and lists - you can see the resulting numbering errors and table loss of formatting in outlines.
Hope this helps.
- ๐จ๐ฆCanada star-szr
To get back to your original post/question, you certainly could set up a custom paste filter to remove all
text-indent
styles, but based on what you are sharing that would only be the tip of the iceberg in terms of what you are trying to achieve.In the bigger picture you may want to take a few steps back and consider other solutions/workflows for getting this content into Drupal. What you are trying to achieve is not simple or easy. If this is really important to get right for your project, then one solution you may want to consider is the paid CKEditor 5 plugin that allows you to import Word documents: https://ckeditor.com/import-from-word/demo/
I'm only mentioning this as an option to consider, I have not used this plugin myself other than on the demo page, and have no connection with CKEditor 5 or CKSource other than I have written some code that may get incorporated into the CKEditor 5 codebase (currently in a pull request on GitHub).
To get an idea of what we are looking at, let's take the document from your comment #5, if you paste that into CKEditor 5 without this module enabled, you will get something similar to the following.
<p class="MsoNormal"> <a name="OLE_LINK19"><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">Test of Word cut/paste</span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;mso-spacerun:yes;" lang="EN-US"> </span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">(color Red, italic)</span></a><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Content from Word should be converted properly, and these two paragraphs use shift-enter for the spacing which should be one line</span> <br> <br> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This second paragraph should only be one line below the first and uses a paragraph break for the next section.</span><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This third section is separated by 2 paragraph breaks.</span><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US"><strong>This is a bullet list: (bold)</strong></span><o:p></o:p> </p> <ul> <li class="MsoListParagraphCxSpFirst" style="mso-list:l0 level1 lfo1;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 1</span><o:p></o:p> </li> <li class="MsoListParagraphCxSpMiddle" style="mso-list:l0 level1 lfo1;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 2</span> <br> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">more about item 2</span><o:p></o:p> </li> <li class="MsoListParagraphCxSpLast" style="mso-list:l0 level1 lfo1;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 3</span><o:p></o:p> </li> </ul> <p class="MsoNormal"> <span style="color:#00B050;mso-bookmark:OLE_LINK1;" lang="EN-US">End of bullet list.(green)</span><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US"><strong>This is a numbered list: (bold)</strong></span><o:p></o:p> </p> <ol> <li class="MsoListParagraphCxSpFirst" style="mso-list:l1 level1 lfo2;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 1</span><o:p></o:p> </li> <li class="MsoListParagraphCxSpMiddle" style="mso-list:l1 level1 lfo2;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 2</span> <br> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">more about Item 2</span><o:p></o:p> </li> <li class="MsoListParagraphCxSpLast" style="mso-list:l1 level1 lfo2;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item 3</span><o:p></o:p> </li> </ol> <p class="MsoNormal"> <span style="color:#0070C0;mso-bookmark:OLE_LINK1;" lang="EN-US">End of numbered list (blue)</span><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This is an alphabetic list:</span><o:p></o:p> </p> <ol style="list-style-type:lower-alpha;"> <li class="MsoListParagraphCxSpFirst" style="mso-list:l2 level1 lfo3;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item a</span><o:p></o:p> </li> <li class="MsoListParagraphCxSpMiddle" style="mso-list:l2 level1 lfo3;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item b</span> <br> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">more about item b</span><o:p></o:p> </li> <li class="MsoListParagraphCxSpLast" style="mso-list:l2 level1 lfo3;text-indent:-18.0pt;"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">Item c</span><o:p></o:p> </li> </ol> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">End of alphabetic list</span><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> <span style="mso-bookmark:OLE_LINK1;" lang="EN-US">This is the end of the test document.</span><o:p></o:p> </p> <p class="MsoNormal"> </p> <p class="MsoNormal"> </p>
This is a mess as you can see, and if you enable this module and use the default settings, you will instead get this:
<p> Test of Word cut/paste (color Red, italic) </p> <p> Content from Word should be converted properly, and these two paragraphs use shift-enter for the spacing which should be one line <br> <br> This second paragraph should only be one line below the first and uses a paragraph break for the next section. </p> <p> This third section is separated by 2 paragraph breaks. </p> <p> <strong>This is a bullet list: (bold)</strong> </p> <ul> <li> Item 1 </li> <li> Item 2 <br> more about item 2 </li> <li> Item 3 </li> </ul> <p> End of bullet list.(green) </p> <p> <strong>This is a numbered list: (bold)</strong> </p> <ol> <li> Item 1 </li> <li> Item 2 <br> more about Item 2 </li> <li> Item 3 </li> </ol> <p> End of numbered list (blue) </p> <p> This is an alphabetic list: </p> <ol> <li> Item a </li> <li> Item b <br> more about item b </li> <li> Item c </li> </ol> <p> End of alphabetic list </p> <p> This is the end of the test document. </p>
Ah, much better. You are then free to layer on whatever styles you would like, on top of the now-cleaned markup. I think this is what most people will end up doing.
Looking for example at the first line with red and italic, the HTML that Word generates makes it very hard to preserve only the color and italics in a reasonable way. I think the closest we could realistically get with the tools provided by this module would be something like the following, and this would require a decent amount of tinkering with regular expressions and may still be fragile or inconsistent:
Before:
<p class="MsoNormal"> <a name="OLE_LINK2"><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">Test of Word cut/paste</span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;mso-spacerun:yes;" lang="EN-US"> </span><span class="MsoSubtleEmphasis" style="color:red;mso-bookmark:OLE_LINK1;" lang="EN-US">(color Red, italic)</span></a><o:p></o:p> </p>
After:
<p> <em style="color:red;">Test of Word cut/paste</em> <em style="color:red;">(color Red, italic)</em> </p>
I hope that's helpful.
- ๐จ๐ฆCanada star-szr
Also you could consider taking a look at the CKEditor 5 Paste from Office plugin, and evaluate whether you could create your own similar plugin with your own logic for what you want to remove.
This is not likely to be a small undertaking, but if you need a sharper tool than this module provides then it may be a path worth considering.
- Status changed to Fixed
12 months ago 7:45pm 29 November 2023 - ๐จ๐ฆCanada star-szr
Marking as fixed as I believe I have addressed the support request and there have been no further questions or follow-up. Thanks!
Automatically closed - issue fixed for 2 weeks with no activity.
- Status changed to Fixed
5 months ago 9:20pm 1 July 2024