- Merge request !2874Issue #3182166: [D7] preg_split in _filter_url breaks for long html tags β (Open) created by stefanos.petrakis@gmail.com
- πΈπ°Slovakia poker10
D10 issue was committed. Therefore changing the status to NW, so that the D7 patch can be checked if it has all changes from the D10 patch and can be moved to NR. Thanks!
- Status changed to Needs review
almost 2 years ago 8:13am 14 March 2023 - π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Ready for another round of reviews, I backported any relevant changes from the final D9/10 committed codes.
- Status changed to Needs work
almost 2 years ago 2:40pm 14 March 2023 - πΊπΈUnited States neclimdul Houston, TX
Again, not D7 specific. You can continue the back port but this needs to be fixed in the latest release.
- Status changed to Needs review
almost 2 years ago 2:53pm 14 March 2023 - πΊπΈUnited States neclimdul Houston, TX
ooof sorry I see #21 now sorry. Annoyed at the duplicate issue... sorry for the noise.
- Status changed to Needs work
almost 2 years ago 11:21pm 23 March 2023 - πΈπ°Slovakia poker10
@stefanos.petrakis Thanks for working on this! I have reviewed the merge request and added some comments.
- Status changed to Needs review
almost 2 years ago 11:59pm 23 March 2023 - π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Thanks @poker10, this is ready for another round.
- πΈπ°Slovakia poker10
Thanks, the changes looks good!
However I am a bit concerned by the testbot on PHP 8.2 / PostgreSQL 13.5, as this combination is ending with a fatal error, see: https://www.drupal.org/pift-ci-job/2626124 β . Not sure if these are some random errors (I have tried 5 runs, see the history), but merge requests on other D7 issues on this PHP 8.2 / PostgreSQL 13.5 combination are green (see: https://www.drupal.org/pift-ci-job/2626080 β ). And also Drupal 7 core is green either. So probably there is some problem related to this patch. It would probably be a good idea to take a look at it.
Other than that, it seems like a straight backport of the D10 patch, so once this testbot issue are solved, I think this could be RTBC. Thanks!
- Status changed to Needs work
almost 2 years ago 12:54pm 24 March 2023 - π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Let's clear this, I am gonna investigate it right now.
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Found it, it was the
pcre.backtrack_limit
messing with preg_replace calls that the pgsql layer was issuing. A little refactoring made lots of sense to keep that effect restricted.Waiting for the tests to set this back to NR.
- Issue was unassigned.
- Status changed to Needs review
almost 2 years ago 2:59pm 24 March 2023 - πΈπ°Slovakia poker10
Thanks, the PostgreSQL problem seems to be solved. But hmm, actually all tests are green except the PHP 7.2 / MySQL 5.6, see: https://www.drupal.org/pift-ci-job/2626322 β
- Assigned to stefanos.petrakis@gmail.com
- Status changed to Needs work
almost 2 years ago 4:29pm 24 March 2023 - π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Let's have a look at this one too.
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Unfortunately, this needs to stay at "Needs Work" for the time.
The test introduced here is based on the idea of breaking the
preg_*
function usingpcre.backtrack_limit
; this idea as @poker10 observed does not work for any PHP version smaller than 7.3. From PHP7.3 and higher allpreg_*
functions return bool(FALSE) on failures.This breaks the new tests due to this part:
// Split at all tags; ensures that no tags or attributes are processed. $chunks = is_null($text) ? array('') : preg_split('/(<.+?>)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
PHP7.2 (and 7,1 and 7,0):
preg_split
splits the string partially and$chunks
is therefore notFALSE
; later on the_filter_url()
returns a result that differs from the input text.PHP5.6 (and 5.3 and probably all 5,*):
preg_split
does not split, but returns the whole string and here$chunks
is notFALSE
either; later on the_filter_url()
returns the whole string as a result by coincidence keeps the tests from breaking.Ideas for extending the tests to support 5.x and 7,0,1,2 are welcome, I have to think about this some.
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
FYI: I tried the different PHP versions here => https://3v4l.org/mhmnn#v5.6.40
using this snippet:
<?php // example code ini_set('pcre.backtrack_limit', 1); $text = "<p>www.test.com</p>"; $chunks = is_null($text) ? array('') : preg_split('/(<.+?>)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE); var_dump($chunks);
- πΊπΈUnited States benjifisher Boston area
I started to look into this compatibility puzzle. As I was browsing the PHP docs for
preg_split()
andpreg_replace_callback()
, I noticed something that looks useful: preg_last_error(). I think it will be more reliable to ask PHP whether there was an error instead of trying to guess based on the result.Following the example of Comment #35, I tested the following snippet:
$text = '<p>www.test.com</p>'; echo 'pcre.backtrack_limit: ' . ini_get('pcre.backtrack_limit') . PHP_EOL; $chunks = preg_split('/(<.+?>)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE); echo 'error: ' . preg_last_error() . PHP_EOL; is_array($chunks) ? print_r($chunks) : var_dump($chunks); ini_set('pcre.backtrack_limit', 1); echo 'pcre.backtrack_limit: ' . ini_get('pcre.backtrack_limit') . PHP_EOL; $chunks = preg_split('/(<.+?>)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE); echo 'error: ' . preg_last_error() . PHP_EOL; is_array($chunks) ? print_r($chunks) : var_dump($chunks);
For example, you can see the result using PHP 7.2 at https://3v4l.org/peFdO#v7.2.34. (How do you get results in multiple PHP versions at once?) I tested all the PHP versions tested in #29. (I tested the latest patch version available for each.) As predicted in #34, the last line varied, but I always got
0
and2
forpreg_last_error()
: that is,PREG_NO_ERROR
andPREG_BACKTRACK_LIMIT_ERROR
.I think it makes sense to replace a line like
$chunks = is_null($text) ? array('') : preg_split('/(<.+
)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
?>with
I think it makes sense to replace a line like
$chunks = !preg_last_error() ? array('') : preg_split('/(<.+
)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
?> - πΊπΈUnited States benjifisher Boston area
Oops, I pasted my comment into the Issue summary field. Let me try to restore it.
- πΊπΈUnited States benjifisher Boston area
Despite the last few lines of Comment #36, I think it will be clearer to use a few more lines. First, move
$saved_text = $text;
inside the loop. Then, add something like this after each call to apreg_
function:if (preg_last_error()) { $text = $saved_text; continue; }
In particular, use that instead of putting most of the loop inside
if ($chunks !== FALSE) { ... }
.I made a similar suggestion in #3239472-56: preg_split in _filter_url breaks for long html tags β and the following comment, and in π Reduce complexity in _filter_url() Active . In Comment #2 on that issue, @alexpott recommends completely refactoring the code in D10, so I think we do not have to worry here about keeping the code consitent between D7 and D10.
- π¬π§United Kingdom RhiP
I've just done a reroll. Rushed, but sharing in case it helps others. Drupal 7.98
- last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago Patch Failed to Apply - last update
over 1 year ago 2,161 pass - last update
about 1 year ago 2,161 pass - Status changed to RTBC
10 months ago 12:35pm 2 April 2024 - π¦πΉAustria klausi π¦πΉ Vienna
Thanks, pull request looks good to me!
I don't see anything else missing here, so we can go ahead with this?
- Status changed to Needs work
10 months ago 8:12am 3 April 2024 - πΈπ°Slovakia poker10
I think the current MR was causing failures on some PHP/SQL combinations, see: https://git.drupalcode.org/project/drupal/-/pipelines/39895
@benjifisher proposed a solution which could fix this (in #37/#38), but if I see correctly, this was not implemented yet.
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Picking this up for work following #37 and #38 (in solidarity to folks at DCon :-) )
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Back to NR, tests are green for all targets
https://git.drupalcode.org/issue/drupal-3182166/-/pipelines/294430 - πΊπΈUnited States benjifisher Boston area
Stefanos:
Thanks for getting back to this. The current patch is a lot easier to review, since it does not re-indent a large block of code. I think you just made one mistake while making that change, so back to NW.
Comments #36 and #38 were a while ago, but re-reading them now, I think what I had in mind was
_filter_url_escape_comments('', TRUE); $text = preg_replace_callback('`<!--(.*?)-->`s', '_filter_url_escape_comments', $text); if (preg_last_error()) { $text = $saved_text; continue; } // Split at all tags; ensures that no tags or attributes are processed. $chunks = preg_split('/(<.+
)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
if (preg_last_error()) {
$text = $saved_text;
continue;
}
?>I wrote,
... add something like this after each call to a
preg_
function:(emphasis added).
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Hey Benji
Regarding:
... add something like this after each call to a preg_ function:
This is now done, via a bit of refactoring, using exception throwing.
Now, all preg_* functions inside _filter_url() will cause reverting to the saved text and moving on to the next iteration.This should be good to move back into NR once the tests are green.
I am off for the weekend, so, please take charge of the codes if that will help finalizing this issue. - π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Tests are green, back to NR
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Back to needs work, added one more change, rerunning the tests
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Ready for review
- πΊπΈUnited States benjifisher Boston area
See the MR comment: there is still one place where the current version checks `is_null($text)` unnecessarily. Back to NW for that.
I guess there are arguments in favor of using exceptions here:
- Exceptions are appropriate for dealing with error conditions, such as when the
pcre.backtrace_limit
parameter is exceeded. - There is only one place in the code that restores
$text = $saved_text
.
On the other hand,
- Exceptions are normally used with complex call stacks, not when interrupting a single loop.
- It is best to keep
try
blocks small, and the current version puts about 60 lines of code in that block.
I will not insist, but I prefer using
$text = $saved_text; break;
instead of throwing and catching exceptions. - Exceptions are appropriate for dealing with error conditions, such as when the
- π¨πSwitzerland stefanos.petrakis@gmail.com Biel, Switzerland
Applied the handy suggestion and then some more.
Thanks for the breakdown regarding Exceptions. And for keeping my brain rolling. :-)
I attempted that \Exception-based refactoring (35c3a164a90184ad3ba0e53520aaa8ba428e6901) for two reasons, aiming to "ease" the code's state:- So that there is only one place in the code that restores $text = $saved_text.
- So that
continue
-ing to the next iteration of the root loop from inside deeply-nested loops can be done in a more economic way (than any other I could think of). E.g.
foreach ($tasks as $task => $pattern) { ... for ($i = 0; $i < count($chunks); $i++) { ... if ($open_tag == '') { ... // This is where we would want the root iteration to abort and continue to the next one, // if a preg_ failure occurs $chunks[$i] = preg_replace_callback($pattern, $task, $chunks[$i]);
But, reading the docs again, I figured this is exactly the case for `continue DEPTH`.
So I replaced\Exception
withcontinue
and removed some more redundant code defending.I find the changeset is now as effective and compact as it can be and the new tests capture what the original issue detected as a problem.
Back to NR (tests are green).
- πΊπΈUnited States benjifisher Boston area
Code review
I find the changeset is now as effective and compact as it can be ...
I like that it is compact! That makes it much easier to review.
reading the docs again, I figured this is exactly the case for
continue DEPTH
.I think I have used
continue 2
once or twice. It does not come up very often. Another option is goto, but the xkcd comic on that page is a very relevant warning.I searched the
_filter_url()
code for "preg_", and I see that you have testedpreg_last_error()
after mostpreg_*()
functions. The exceptions arepreg_match()
in theelse
clause of the innerfor
loop, applied to$chunks[$i]
rather than the full$text
. Those are probably safe.The one place where you needed
continue 2
is in theif
clause of that inner loop, also applied to$chunks[$i]
. I think that is also safe, but the extra check does not hurt.Testing
PHP 7.4
I followed the "Steps to reproduce" in the issue summary. With 7.x, I got two messages. (The issue summary mentions only the first one.)
- Warning: count(): Parameter must be an array or an object that implements Countable in _filter_url() (line 1535 of /var/www/html/modules/filter/filter.module).
- Notice: Trying to access array offset on value of type bool in _filter_url() (line 1541 of /var/www/html/modules/filter/filter.module).
As expected, the errors are repeated if I clear caches and reload the page. The error comes from rendering the page, not from saving the node.
I switched to the feature branch. Clearing caches and reloading the page, there are no messages, but also no content. Then I edited the text format, enabling only the "Convert URLs into links" filter. Clear caches, reload: the image (from the gist in the issue summary) appears with no errors. Switching back to 7.x, I still see the two messages and no content.
PHP 8.0
With 7.x, I get a WSOD with the message
Error
The website encountered an unexpected error. Please try again later.
Error message
TypeError: count(): Argument #1 ($value) must be of type Countable|array, bool given in _filter_url() (line 1535 of /var/www/html/modules/filter/filter.module).Switching to the feature branch, the page loads normally.
- Issue was unassigned.
- Status changed to RTBC
about 2 months ago 3:08pm 29 November 2024 - πΈπ°Slovakia poker10
Thanks all for working on this issue and also for the great summary from testing @benjifisher!
I reviewed the MR and it looks good and the solution is now pretty straightforward.
I have also rebased the MR to run the test-only tests and they are failing as expected:
PHP 7.4 - https://git.drupalcode.org/project/drupal/-/jobs/3535616
PHP 8.0 - https://git.drupalcode.org/project/drupal/-/jobs/3535596Regular tests are green as well: https://git.drupalcode.org/project/drupal/-/pipelines/354368
Now, all preg_* functions inside _filter_url() will cause reverting to the saved text and moving on to the next iteration.
We can probably think if we need to create a change record for this (D10 did not have a CR for this), but otherwise this seems to be ready. Adding a tag for the final review from other D7 maintainer. Thanks!
-
mcdruid β
committed 12aace32 on 7.x
Issue #3182166 by stefanos.petrakis@gmail.com, poker10, almador, adinan...
-
mcdruid β
committed 12aace32 on 7.x
Automatically closed - issue fixed for 2 weeks with no activity.