[D7] preg_split in _filter_url breaks for long html tags

Created on 12 November 2020, over 3 years ago
Updated 3 April 2024, 3 months ago

Problem/Motivation

N.B.: Backport issue from 🐛 preg_split in _filter_url breaks for long html tags Fixed

Having a really long HTML tag (e.g. < img > with src:data/image...) makes preg_split fail and return false. On PHP 7.4 this throws a warning ("Warning: count(): Parameter must be an array or an object that implements Countable in _filter_url() (line 535 of core/modules/filter/filter.module).") and makes the field render empty, on PHP 8.0 this throws a fatal error.

Steps to reproduce

1) Have a text format that has Convert URLs into links enabled
2) Using that text format, add a node with content like this: https://gist.github.com/kporras07/618b3bf4cd77ff57fcd5034262220e99
3) Visit the node
4) You will get the warning and empty node or the fatal error depending on your PHP version

Proposed resolution

Backport from 🐛 preg_split in _filter_url breaks for long html tags Fixed

🐛 Bug report
Status

Needs work

Version

7.0 ⚰️

Component
Filter 

Last updated about 2 hours ago

No maintainer
Created by

🇷🇺Russia DD 85

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇸🇰Slovakia poker10

    D10 issue was committed. Therefore changing the status to NW, so that the D7 patch can be checked if it has all changes from the D10 patch and can be moved to NR. Thanks!

  • Status changed to Needs review over 1 year ago
  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    Ready for another round of reviews, I backported any relevant changes from the final D9/10 committed codes.

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States neclimdul Houston, TX

    Again, not D7 specific. You can continue the back port but this needs to be fixed in the latest release.

  • Status changed to Needs review over 1 year ago
  • 🇺🇸United States neclimdul Houston, TX

    ooof sorry I see #21 now sorry. Annoyed at the duplicate issue... sorry for the noise.

  • 🇨🇦Canada joseph.olstad

    Great work @stefanos!

  • Status changed to Needs work over 1 year ago
  • 🇸🇰Slovakia poker10

    @stefanos.petrakis Thanks for working on this! I have reviewed the merge request and added some comments.

  • Status changed to Needs review over 1 year ago
  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    Thanks @poker10, this is ready for another round.

  • 🇸🇰Slovakia poker10

    Thanks, the changes looks good!

    However I am a bit concerned by the testbot on PHP 8.2 / PostgreSQL 13.5, as this combination is ending with a fatal error, see: https://www.drupal.org/pift-ci-job/2626124 . Not sure if these are some random errors (I have tried 5 runs, see the history), but merge requests on other D7 issues on this PHP 8.2 / PostgreSQL 13.5 combination are green (see: https://www.drupal.org/pift-ci-job/2626080 ). And also Drupal 7 core is green either. So probably there is some problem related to this patch. It would probably be a good idea to take a look at it.

    Other than that, it seems like a straight backport of the D10 patch, so once this testbot issue are solved, I think this could be RTBC. Thanks!

  • Status changed to Needs work over 1 year ago
  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    Let's clear this, I am gonna investigate it right now.

  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    Found it, it was the pcre.backtrack_limit messing with preg_replace calls that the pgsql layer was issuing. A little refactoring made lots of sense to keep that effect restricted.

    Waiting for the tests to set this back to NR.

  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland
  • 🇸🇰Slovakia poker10

    Thanks, the PostgreSQL problem seems to be solved. But hmm, actually all tests are green except the PHP 7.2 / MySQL 5.6, see: https://www.drupal.org/pift-ci-job/2626322

  • Assigned to stefanos.petrakis
  • Status changed to Needs work over 1 year ago
  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    Let's have a look at this one too.

  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    Unfortunately, this needs to stay at "Needs Work" for the time.

    The test introduced here is based on the idea of breaking the preg_* function using pcre.backtrack_limit; this idea as @poker10 observed does not work for any PHP version smaller than 7.3. From PHP7.3 and higher all preg_* functions return bool(FALSE) on failures.

    This breaks the new tests due to this part:

        // Split at all tags; ensures that no tags or attributes are processed.
        $chunks = is_null($text) ? array('') : preg_split('/(<.+?>)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    

    PHP7.2 (and 7,1 and 7,0): preg_split splits the string partially and $chunks is therefore not FALSE; later on the _filter_url() returns a result that differs from the input text.

    PHP5.6 (and 5.3 and probably all 5,*): preg_split does not split, but returns the whole string and here $chunks is not FALSE either; later on the _filter_url() returns the whole string as a result by coincidence keeps the tests from breaking.

    Ideas for extending the tests to support 5.x and 7,0,1,2 are welcome, I have to think about this some.

  • 🇨🇭Switzerland stefanos.petrakis Biel, Switzerland

    FYI: I tried the different PHP versions here => https://3v4l.org/mhmnn#v5.6.40

    using this snippet:

    <?php
    // example code
    
    ini_set('pcre.backtrack_limit', 1);
    $text = "<p>www.test.com</p>";
    $chunks = is_null($text) ? array('') : preg_split('/(<.+?>)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    var_dump($chunks);
    
  • 🇺🇸United States benjifisher Boston area

    I started to look into this compatibility puzzle. As I was browsing the PHP docs for preg_split() and preg_replace_callback(), I noticed something that looks useful: preg_last_error(). I think it will be more reliable to ask PHP whether there was an error instead of trying to guess based on the result.

    Following the example of Comment #35, I tested the following snippet:

    $text = '<p>www.test.com</p>';
    
    echo 'pcre.backtrack_limit: ' . ini_get('pcre.backtrack_limit') . PHP_EOL;
    $chunks = preg_split('/(&lt;.+?&gt;)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    echo 'error: ' . preg_last_error() . PHP_EOL;
    is_array($chunks) ? print_r($chunks) : var_dump($chunks);
    
    ini_set('pcre.backtrack_limit', 1);
    echo 'pcre.backtrack_limit: ' . ini_get('pcre.backtrack_limit') . PHP_EOL;
    $chunks = preg_split('/(&lt;.+?&gt;)/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    echo 'error: ' . preg_last_error() . PHP_EOL;
    is_array($chunks) ? print_r($chunks) : var_dump($chunks);
    

    For example, you can see the result using PHP 7.2 at https://3v4l.org/peFdO#v7.2.34. (How do you get results in multiple PHP versions at once?) I tested all the PHP versions tested in #29. (I tested the latest patch version available for each.) As predicted in #34, the last line varied, but I always got 0 and 2 for preg_last_error(): that is, PREG_NO_ERROR and PREG_BACKTRACK_LIMIT_ERROR.

    I think it makes sense to replace a line like

        $chunks = is_null($text) ? array('') : preg_split('/(<.+

    )/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    ?>

    with

    I think it makes sense to replace a line like

        $chunks = !preg_last_error() ? array('') : preg_split('/(<.+

    )/is', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    ?>

  • 🇺🇸United States benjifisher Boston area

    Oops, I pasted my comment into the Issue summary field. Let me try to restore it.

  • 🇺🇸United States benjifisher Boston area

    Despite the last few lines of Comment #36, I think it will be clearer to use a few more lines. First, move $saved_text = $text; inside the loop. Then, add something like this after each call to a preg_ function:

        if (preg_last_error()) {
          $text = $saved_text;
          continue;
        }
    

    In particular, use that instead of putting most of the loop inside if ($chunks !== FALSE) { ... }.

    I made a similar suggestion in #3239472-56: preg_split in _filter_url breaks for long html tags and the following comment, and in 📌 Reduce complexity in _filter_url() Active . In Comment #2 on that issue, @alexpott recommends completely refactoring the code in D10, so I think we do not have to worry here about keeping the code consitent between D7 and D10.

  • 🇬🇧United Kingdom RhiP

    I've just done a reroll. Rushed, but sharing in case it helps others. Drupal 7.98

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 5.3 & MySQL 5.5
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 7.2 & MySQL 5.6
    last update 10 months ago
    Patch Failed to Apply
  • last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 7.4 & SQLite 3.27
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 7.4 & PostgreSQL 9.5
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.0 & MySQL 5.7
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 7.4 & MySQL 5.7
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.2 & pgsql-13.5
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 5.6 & MySQL 5.5
    last update 10 months ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update 8 months ago
    2,161 pass
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update 8 months ago
    2,161 pass
  • Status changed to RTBC 3 months ago
  • 🇦🇹Austria klausi 🇦🇹 Vienna

    Thanks, pull request looks good to me!

    I don't see anything else missing here, so we can go ahead with this?

  • Status changed to Needs work 3 months ago
  • 🇸🇰Slovakia poker10

    I think the current MR was causing failures on some PHP/SQL combinations, see: https://git.drupalcode.org/project/drupal/-/pipelines/39895

    @benjifisher proposed a solution which could fix this (in #37/#38), but if I see correctly, this was not implemented yet.

Production build 0.69.0 2024