Adding nbsp; removing + better prep_replace pattern

Comment over 2 years ago →
🇺🇸United States danflanagan8 St. Louis, US
Hi! Thanks for this module and the proposed change in this issue. I started by writing a bunch of test cases and found that this updated approach got a bit closer to what I was looking for, but it didn't get all the way there for me.

I'm posting an updated approach that handles ,
(etc), and more complicated recursive emptiness. I don't think the test coverage is configured to be run here, but locally I can assure you that everything passes.

🇺🇸United States danflanagan8 St. Louis, US

FWIW, here's are the failures with the current filter in the module:

Testing Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest
.......FF..FFFFFF......                                           23 / 23 (100%)

Time: 1.39 seconds, Memory: 4.00 MB

There were 8 failures:

1) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #7 ('<p class="something"></p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p class="something"></p>'


2) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #8 ('<p id="whatever" class="somet..."></p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p id="whatever" class="something"></p>'


3) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #11 ('<p><br/></p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p><br/></p>'


4) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #12 ('<p>&nbsp;</p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p>&nbsp;</p>'


5) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #13 ('<p>&nbsp;&nbsp;</p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p>&nbsp;&nbsp;</p>'


6) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #14 ('<p> <b> </b> </p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p>  </p>'


7) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #15 ('<p>&nbsp;<b>&nbsp;</b>&nbsp;</p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p>&nbsp;<b>&nbsp;</b>&nbsp;</p>'


8) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #16 ('<drupal-media data-caption="b...media>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<drupal-media data-caption="baz" data-entity-type="media" data-entity-uuid="abc123"></drupal-media>'

And below are the failures with the proposal in the IS.

Testing Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest
...........F..FF.F..F..                                           23 / 23 (100%)

Time: 1.39 seconds, Memory: 4.00 MB

There were 5 failures:

1) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #11 ('<p><br/></p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p><br/></p>'


2) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #14 ('<p> <b> </b> </p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p>  </p>'


3) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #15 ('<p>&nbsp;<b>&nbsp;</b>&nbsp;</p>', '')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-''
+'<p>  </p>'


4) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterControlStrings with data set #0 ('<h2>This is a control string<...&nbsp;')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'<h2>This is a control string</h2><div>This string should <b><em>not</em></b> be altered in ANY&nbsp;WAY by the filter. </div> <hr><p class="some-class"><><><></p>&nbsp;'
+'<h2>This is a control string</h2><div>This string should <b><em>not</em></b> be altered in ANY WAY by the filter. </div> <hr><p class="some-class"><><><></p> '


5) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterControlStrings with data set #3 ('<h1> <h2> hi <p><div>yes</div...&nbsp;')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'<h1> <h2> hi <p><div>yes</div></p><br></h2>&nbsp;'
+'<h1> <h2> hi <p><div>yes</div></p><br></h2> '

Comment over 2 years ago →
🇺🇸United States danflanagan8 St. Louis, US
Hm, I realized that the filter in #4 would always strip an iframe, which is no good! Here's an update that declares a list of tags to never consider empty. This includes iframe as well as drupal-media and drupal-entity, which are nice to protect. This should make the weight of this filter less important.

I couldn't figure out how to do this with a preg_replace, so I changed course to use a preg_replace_all and then we loop through the matches.

Test coverage is even better in this patch and everything passes locally.
Comment over 2 years ago →
🇺🇸United States danflanagan8 St. Louis, US
TIL \s does not match the literal (not html entity) version of the non-breaking space. This latest patch addresses that and adds a couple test cases why not.
Comment about 2 years ago →
🇺🇸United States danflanagan8 St. Louis, US
I have a nutty site with script tags in wysywigs. I need to update the list of never-empty tags to include script tags. I also realized button should be allowed and then alphabetized the list. Ideally this list would be configurable. Maybe a goal for 2.x.
Status changed to RTBC about 2 years ago3:37pm 8 May 2023
Comment about 2 years ago →
🇫🇷France Dom.
Hi !

Thanks @danflanagan8 for your patch submission. It includes tests as I can see and this is really quality work.
I am very sorry for (huge) response time, and I wish you would forgive me.

As of today, I will change a little something in your patch: I guess this is just an encoding issue, but opening your patch in the browser shows a "Â" character extra that kills for french langage on both those lines:

'/<([^<\/>)]*)\b[^>]*>((\s|Â | |<br>|<br\/>|<br \/>)*?)<\/\1>/imsU',
['<p>Â </p>', ''],

Therefore let me just remove that characters (that does not show on the review plugin surprisingly).
Comment about 2 years ago →
System Message

Dom. → committed 96abb134 on 1.x
Issue #3208610 by danflanagan8: Adding nbsp; removing + better...
Status changed to Fixed about 2 years ago3:46pm 8 May 2023
Comment about 2 years ago →
🇫🇷France Dom.
Will be included in release 2.0.0.
Also, I am adding up a follow-up for configuring which tags should be configurable as skippable: ✨ Make "doNotConsiderEmpty" tags configurable. Fixed
Comment about 2 years ago →
🇺🇸United States danflanagan8 St. Louis, US
Hi Dom.,
That strangely encoded character was supposed to be the unicode non-breaking space. In my manual testing it is important to have that character as part of the regex.

In my text editor it shows up as [NBSP]. I copy/pasted from some online resource to get the literal unicode character.

I'm not very knowledgeable about encoding though to be honest. We need to somehow get the regex to recognize a literal unicode nbsp.

Looks like maybe we could use \xc2\xa0/?

https://stackoverflow.com/questions/40724543/how-to-replace-decoded-non-...
Status changed to Needs work about 2 years ago4:01pm 9 May 2023
Comment about 2 years ago →
🇫🇷France Dom.
That shows to me as : |Â | which is the letter A with the ^ followed by a normal space.
I therefore tried that is a div on front and indeed it is left as is.

May I suggest to try using |\u00A0| which is the unicode character \u00A0 that I believe to correspond to nbsp ?
https://www.fileformat.info/info/unicode/char/00a0/index.htm
Comment about 2 years ago →
🇺🇸United States danflanagan8 St. Louis, US
I'm learning a lot here!

I tried using \u00A0 and got this error:

preg_match_all(): Compilation failed: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u at offset 34

I swtiched to \xc2\xa0 and that appears to have worked.

I'm working on the 2.x issue now. That's probably where the focus should be.

Comment almost 2 years ago →

System Message

Dom. → committed 986243bb on 2.x

Issue #3208610 by danflanagan8, Dom.: Adding nbsp; removing + better...

Status changed to Fixed almost 2 years ago9:06am 18 July 2023
Comment almost 2 years ago →
🇫🇷France Dom.
I replaced the non-breakable character with \xc2\xa0.
Added a test which mixes nbsp, space, and return char.

Comment almost 2 years ago →

System Message

Dom. → committed 986243bb on 1.x

Issue #3208610 by danflanagan8, Dom.: Adding nbsp; removing + better...

Comment almost 2 years ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Adding nbsp; removing + better prep_replace pattern

Comments & Activities