PHP 8.1 preg_replace(): Passing null to parameter #3 ($subject) of type array|string is deprecated HtmlFilter processor

Created on 13 March 2023, almost 2 years ago
Updated 3 November 2023, about 1 year ago

Using Search API 8x-1.27 version with php 8.1, got an error while indexing data:

Deprecated: preg_replace(): Passing null to parameter #3 ($subject) of type array|string is deprecated in /mnt/www/html/marsinc01dev/docroot/modules/contrib/search_api/src/Plugin/search_api/processor/HtmlFilter.php on line 206

Apparently, not all items have value for 'alt', prepared patch for it.

πŸ› Bug report
Status

Fixed

Version

1.29

Component

Plugins

Created by

πŸ‡ΊπŸ‡¦Ukraine itaran

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @itaran
  • Status changed to Postponed: needs info almost 2 years ago
  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    Thanks for reporting this issue!
    Can you reliably reproduce this error message? If so, could you try to debug what is going on in that processFieldValue() method? Specifically, are there any other warnings/errors?
    From looking at the code, it seems the only way for $text to be NULL at that point is for the previous preg_replace() call to fail with an error.

    We might still want to guard against such errors, but first it would be good to know if there are other problems in that code.

  • Status changed to Active over 1 year ago
  • I have just updated search_api from 8.x-1.25 => 8.x-1.29 and ran into this problem when I tried to rebuild the index.

    One of the problem pages has an image pasted directly into the text, so the html is:
    <p><img alt="" src="image/png;base64,iVBORw0 - LONG HEX STRING FOLLOWS

    It has the alt="" which matches the original description of the problem.

    I've attached a copy of the contents of $value that was passed into function processFieldValue()

  • Status changed to Needs review over 1 year ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    498 pass, 5 fail
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    535 pass
  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    Thanks a lot for this additional information, that enabled me to reproduce the problem.
    Well, using regular expressions there was the lazy way out, but I guess that fails when the backtracking gets too much. So, attached is a reimplementation using just normal mb_strpos() calls and one very straight-forward regular expression that only uses possessive quantifiers.

    Please test/review!

  • I can confirm that the deprecation error no longer appears after applying the patch, and `$value` ends up the same. I'll hold off on RTBC for now because I have not reviewed patch contents.

  • Status changed to Fixed over 1 year ago
  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    If it works for you, that’s already good to know, thanks. Since there is automated test coverage, I think that’s already enough.
    Merged. Thanks again!

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed about 1 year ago
  • πŸ‡«πŸ‡·France Chris64 France

    Some things not clear.

    but I guess that fails when the backtracking gets too much.

    Is it the real problem or juste an idea?
    So alt="" produces this error? And why? Since parameter #3 ($subject) is the reterned value of preg_replace, null if an error occurred.
    And what is such an error? What is the connection? "" makes a problem since matching no expression in,
    $text = preg_replace('/<[-a-z_]+[^>]*["\s]alt\s*=\s*("([^"]+)"|\'([^\']+)\')[^>]*>/i', ' <img>$2$3</img> ', $text);
    Look not nice. To get an alternative expression in the same idea, and to match "" or '',
    $text = preg_replace('/<[-a-z_]+[^>]*["\s]alt\s*=\s*(?|"([^"]*)"|\'([^\']*)\')[^>]*>/i', ' <img>$1</img> ', $text);
    In the same spirit instead of,
    $text = preg_replace('/(<[-a-z_]+[^>]*["\s])title\s*=\s*("([^"]+)"|\'([^\']+)\')([^>]*>)/i', '$1 $5 $3$4 ', $text);
    rather,
    $text = preg_replace('/(<[-a-z_]+[^>]*["\s])title\s*=\s*(?|"([^"]*)"|\'([^\']*)\')([^>]+>)/i', '$1 $3 $2 ', $text);

Production build 0.71.5 2024