- ๐บ๐ธUnited States danflanagan8 St. Louis, US
Hi! Thanks for this module and the proposed change in this issue. I started by writing a bunch of test cases and found that this updated approach got a bit closer to what I was looking for, but it didn't get all the way there for me.
I'm posting an updated approach that handles ,
(etc), and more complicated recursive emptiness. I don't think the test coverage is configured to be run here, but locally I can assure you that everything passes. - ๐บ๐ธUnited States danflanagan8 St. Louis, US
FWIW, here's are the failures with the current filter in the module:
Testing Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest .......FF..FFFFFF...... 23 / 23 (100%) Time: 1.39 seconds, Memory: 4.00 MB There were 8 failures: 1) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #7 ('<p class="something"></p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p class="something"></p>' 2) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #8 ('<p id="whatever" class="somet..."></p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p id="whatever" class="something"></p>' 3) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #11 ('<p><br/></p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p><br/></p>' 4) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #12 ('<p> </p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p> </p>' 5) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #13 ('<p> </p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p> </p>' 6) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #14 ('<p> <b> </b> </p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p> </p>' 7) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #15 ('<p> <b> </b> </p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p> <b> </b> </p>' 8) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #16 ('<drupal-media data-caption="b...media>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<drupal-media data-caption="baz" data-entity-type="media" data-entity-uuid="abc123"></drupal-media>'
And below are the failures with the proposal in the IS.
Testing Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest ...........F..FF.F..F.. 23 / 23 (100%) Time: 1.39 seconds, Memory: 4.00 MB There were 5 failures: 1) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #11 ('<p><br/></p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p><br/></p>' 2) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #14 ('<p> <b> </b> </p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p> </p>' 3) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterEmptyTags with data set #15 ('<p> <b> </b> </p>', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'' +'<p> </p>' 4) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterControlStrings with data set #0 ('<h2>This is a control string<... ') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'<h2>This is a control string</h2><div>This string should <b><em>not</em></b> be altered in ANY WAY by the filter. </div> <hr><p class="some-class"><><><></p> ' +'<h2>This is a control string</h2><div>This string should <b><em>not</em></b> be altered in ANY WAY by the filter. </div> <hr><p class="some-class"><><><></p> ' 5) Drupal\Tests\filter_empty_tags\Unit\FilterEmptyTagsTest::testFilterControlStrings with data set #3 ('<h1> <h2> hi <p><div>yes</div... ') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'<h1> <h2> hi <p><div>yes</div></p><br></h2> ' +'<h1> <h2> hi <p><div>yes</div></p><br></h2> '
- ๐บ๐ธUnited States danflanagan8 St. Louis, US
Hm, I realized that the filter in #4 would always strip an iframe, which is no good! Here's an update that declares a list of tags to never consider empty. This includes iframe as well as drupal-media and drupal-entity, which are nice to protect. This should make the weight of this filter less important.
I couldn't figure out how to do this with a preg_replace, so I changed course to use a preg_replace_all and then we loop through the matches.
Test coverage is even better in this patch and everything passes locally.
- ๐บ๐ธUnited States danflanagan8 St. Louis, US
TIL
\s
does not match the literal (not html entity) version of the non-breaking space. This latest patch addresses that and adds a couple test cases why not. - ๐บ๐ธUnited States danflanagan8 St. Louis, US
I have a nutty site with script tags in wysywigs. I need to update the list of never-empty tags to include script tags. I also realized button should be allowed and then alphabetized the list. Ideally this list would be configurable. Maybe a goal for 2.x.
- Status changed to RTBC
over 1 year ago 3:37pm 8 May 2023 - ๐ซ๐ทFrance Dom.
Hi !
Thanks @danflanagan8 for your patch submission. It includes tests as I can see and this is really quality work.
I am very sorry for (huge) response time, and I wish you would forgive me.As of today, I will change a little something in your patch: I guess this is just an encoding issue, but opening your patch in the browser shows a "ร" character extra that kills for french langage on both those lines:
'/<([^<\/>)]*)\b[^>]*>((\s|ร | |<br>|<br\/>|<br \/>)*?)<\/\1>/imsU',
['<p>ร </p>', ''],
Therefore let me just remove that characters (that does not show on the review plugin surprisingly).
- Status changed to Fixed
over 1 year ago 3:46pm 8 May 2023 - ๐ซ๐ทFrance Dom.
Will be included in release 2.0.0.
Also, I am adding up a follow-up for configuring which tags should be configurable as skippable: โจ Make "doNotConsiderEmpty" tags configurable. Fixed - ๐บ๐ธUnited States danflanagan8 St. Louis, US
Hi Dom.,
That strangely encoded character was supposed to be the unicode non-breaking space. In my manual testing it is important to have that character as part of the regex.In my text editor it shows up as
[NBSP]
. I copy/pasted from some online resource to get the literal unicode character.I'm not very knowledgeable about encoding though to be honest. We need to somehow get the regex to recognize a literal unicode nbsp.
Looks like maybe we could use
\xc2\xa0/
?https://stackoverflow.com/questions/40724543/how-to-replace-decoded-non-...
- Status changed to Needs work
over 1 year ago 4:01pm 9 May 2023 - ๐ซ๐ทFrance Dom.
That shows to me as :
|ร |
which is the letter A with the ^ followed by a normal space.
I therefore tried that is a div on front and indeed it is left as is.May I suggest to try using
|\u00A0|
which is the unicode character \u00A0 that I believe to correspond to nbsp ?
https://www.fileformat.info/info/unicode/char/00a0/index.htm - ๐บ๐ธUnited States danflanagan8 St. Louis, US
I'm learning a lot here!
I tried using
\u00A0
and got this error:preg_match_all(): Compilation failed: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u at offset 34
I swtiched to
\xc2\xa0
and that appears to have worked.I'm working on the 2.x issue now. That's probably where the focus should be.
- Status changed to Fixed
over 1 year ago 9:06am 18 July 2023 - ๐ซ๐ทFrance Dom.
I replaced the non-breakable character with \xc2\xa0.
Added a test which mixes nbsp, space, and return char. Automatically closed - issue fixed for 2 weeks with no activity.