Deprecated function: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated

Created on 16 February 2023, almost 2 years ago
Updated 25 May 2023, over 1 year ago

I'm using this module in Drupal 10 on PHP 8.2. This results in the following error:

Deprecated function: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead in Drupal\smart_trim\TruncateHTML->init() (line 78 of /app/web/modules/contrib/smart_trim/src/TruncateHTML.php).

I will upload a patch in the comment section.

🐛 Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

🇳🇱Netherlands Ewout Goosmann

Live updates comments and jobs are added and updated live.
  • PHP 8.2

    The issue particularly affects sites running on PHP version 8.2.0 or later.

  • Novice

    It would make a good project for someone who is new to the Drupal contribution process. It's preferred over Newbie.

Sign in to follow issues

Comments & Activities

  • Issue created by @Ewout Goosmann
  • 🇳🇱Netherlands Ewout Goosmann

    This patch should solve the deprecation warning about the deprecated function.

    I have used https://www.php.net/manual/en/function.mb-convert-encoding.php#127529 to fix this.

  • 🇳🇱Netherlands Ewout Goosmann

    Reroll patch for version 2.0.x of this module.

  • Status changed to Needs review over 1 year ago
  • Status changed to Needs work over 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    Confirming this issue!

    @Ewout Goosmann could you perhaps prepare the patch as MR to make it easier for the maintainers to review and merge?

    -    $dom = Html::load(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
    +    $dom = Html::load(mb_encode_numericentity($html, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));

    Looks very cryptic! Even if it works, it's hard to understand.

    I'd suggest having a look at alternatives like: https://stackoverflow.com/questions/11974008/alternative-to-mb-convert-e...

  • 🇺🇸United States ultimike Florida, USA

    I'm going to suggest we do it the same way Symfony does.

    Replace:

    $dom = Html::load(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));

    with:

    $dom = Html::load(htmlspecialchars_decode(iconv('UTF-8', 'ISO-8859-1', htmlentities($html, ENT_COMPAT, 'UTF-8')), ENT_QUOTES));

    Anyone who is a novice want to create a patch or pull request?

    -mike

  • Assigned to akshaydalvi212
  • 🇮🇳India akshaydalvi212

    I will provide the patch as suggested by @ultimike.

  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • 🇮🇳India akshaydalvi212

    providing patch #8 file to solve the issue.
    kindly review.

  • 🇩🇪Germany Anybody Porta Westfalica

    Patch looks good. Can anyone explain, why this is needed at all? Drupal is UTF-8 entirely, isn't it?

  • 🇺🇸United States ultimike Florida, USA

    @akshaydalvi212 please refrain from creating patches for issues that are marked as "novice". Looking at your drupal.org profile, I think you've graduated from "novice" issues (congrats!)

    @Anybody - I don't have a good answer for your question, but a deprecated function is a deprecated function...

    -mike

  • 🇩🇪Germany Anybody Porta Westfalica

    @Anybody - I don't have a good answer for your question, but a deprecated function is a deprecated function...

    I had a short look and it was introduced here 6 years ago: #2733381: UTF-8 encoding needed to show all characters correctly
    Still I wonder why it's needed and think we should understand, as Drupal 8+ IS UTF8 based. It is a deprecated function, but should we fix something we do not fully understand?

    The issue says:

    Smart_trim uses loadHtml() which treats our text as ISO-8859-1 unless we tell it otherwise.

    https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Util...
    has <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
    Perhaps that was fixed in core in the meantime?

    Also see
    https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not...

    So at least we should try without it and write a test to ensure it works a as expected?
    If not here, then as follow-up?

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States ultimike Florida, USA

    @Anybody,

    I can get behind this line of thinking. Changing (on my local)

    $dom = Html::load(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));

    to

    $dom = Html::load($html);

    All tests still pass.

    I'd like to add some additional test data to TruncateHTMLTest::truncateCharsDataProvider() and TruncateHTMLTest::truncateWordsDataProvider() and then re-run the tests. Specifically, some sample test data from issues like 🐛 Error cropping Japanese Needs work would be fantastic.

    Sound like a plan forward?

    -mike

  • 🇺🇸United States ultimike Florida, USA

    Here's a really useful resource for PHP and UTF-8 - this really helped me understand the issue better. It is also important to note (as @Anybody mentioned previously), that Drupal 8+ enforces UTF-8 in the database, so we don't need to worry about converting encodings.

    https://phptherightway.com/#php_and_utf8

    -mike

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    7 pass, 2 fail
  • @rajeshreeputra opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    7 pass, 2 fail
  • 🇺🇸United States ultimike Florida, USA

    @Rajeshreeputra thank you for converting the patch into an issue fork.

    -mike

  • Assigned to ankitv18
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    7 pass, 1 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    7 pass, 1 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    17 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    17 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    17 pass
  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    17 pass
  • 🇮🇳India ankitv18

    Hi @ultimike
    With referring https://php.watch/versions/8.2/mbstring-qprint-base64-uuencode-html-enti... I've fixed the deprecation issue and now MR!43 is ready for a review.

  • Status changed to RTBC over 1 year ago
  • 🇮🇳India deepakkm

    As PHP8.2 tests are passing we can merge this MR, also tested this locally with drupal-check.

  • 🇩🇪Germany Anybody Porta Westfalica

    @ultimike #14 sounds like these lines could be entirely removed or what was your final result?

  • Status changed to Needs work over 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    Back to NW to clarify #14 and the way to go.

    Testing #14 as MR now. I'm still afraid of security issues with MR!43.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    17 pass
  • @anybody opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    17 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    17 pass
  • Status changed to Needs review over 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    If MR!45 goes green, I'd prefer that one. If anything breaks then, this means it needs an additional test for this case.

    @ultimike what do you say?

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States ultimike Florida, USA

    I definitely think we need to add a test for this change. Specifically, I would love to see an addition to the existing Unit test that fails before this change and passes after the change.

    -mike

  • 🇮🇪Ireland lostcarpark

    It should be noted that without this patch, under PHP8.2, the following tests fail with a "Deprecated function" exception:

    SmartTrimFunctionalTest::testTokenNotCutOffTrimTypeCharacters
    SmartTrimFunctionalTest::testMoreLinkOnlyWhenContentIsTrimmed

    As they pass in 8.2 with the patch, that would seem a good indicator that this patch is needed.

    However, I think it's worth trying to develop a specific test to check for this. The ideal would be to find a string that mb_convert_encoding incorrectly translates, as it is not expecting it to already in UTF8 format. I will do some experimenting.

  • 🇮🇪Ireland lostcarpark

    Running just the the unit tests under 8.2 results in passing tests, but with the deprecation notice below:

    Testing /var/www/html/modules/smart_trim-3342481/tests/src/Unit
    ..........                                                        10 / 10 (100%)
    
    Time: 00:00.044, Memory: 4.00 MB
    
    OK (10 tests, 10 assertions)
    
    Remaining self deprecation notices (1)
    
      1x: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead
        1x in TruncateHTMLTest::testTruncateChars from Drupal\Tests\smart_trim\Unit
    

    Changing to either merge request branch, the unit tests pass with no warning.

  • 🇮🇪Ireland lostcarpark

    As I understand it, mb_convert_encoding is converting from UTF8 content to HTML entities.

    If we go with MR#43, htmlspecialchars_decode(htmlentities($html)) does exactly the same thing, so I'm not sure a new test will add anything. I think the string with entities gets converted back to UTF8 before it gets returned, otherwise I don't think the Armenia character test would pass.

    If we pick MR#45, we are dropping the conversion of UTF-8 to entities. If they would get converted back anyway, I'm not sure this conversion adds anything.

    Another option I looked was using the $this->expectDeprecated() method, but I don't think this is useful, as we're removing the deprecated code from the codebase. I don't think there's a "expectNotDeprecated", since thats basically every test.

    I'm doubtful that a new test can distinguish the return value from the old method, but if anyone can suggest a way, I'm happy to try coding it.

  • 🇮🇳India rajeshreeputra Pune

    For me MR!43 looks good. +1 RTBC

  • 🇩🇪Germany Anybody Porta Westfalica

    @Rajeshreeputra that doesn't help. MR!43 is still the more (security & performance) risky one. So we have to compare and chose MR!43 if there are no good reasons to pick MR!45

  • 🇮🇪Ireland lostcarpark

    I agree with @Anybody. Code review needs to compare MR!43 with MR!45, and consider reasons for choosing one over the other.

    Ideally, we should find a test that fails for one MR and passes for the other. If we can't find a test that differentiates them, then I think that would point to MR!45 as the simpler solution.

    There's a suggestion in #22 that MR!43 could pose a security risk. I think we should focus on a test that exposes that risk. @Anybody, if you could elaborate on the possible risk, I'd be happy to work on a test to verify.

  • 🇺🇸United States ultimike Florida, USA

    Reviewing this issue with @lostcarpark, I am a bit confused about why I thought the code from my comment 6 above is/was from Symfony. It appears that I was incorrect - feel free to check me on this.

    All that being said, I think the simpler solution !45 is best, unless we can prove otherwise.

    @lostcarpark and I also tried to devise a test that would fail with mb_convert_encoding(), but were unable to. Therefore, I vote that we should merge !45.

    @markie - do you concur?

    -mike

  • 🇮🇪Ireland lostcarpark

    As both solutions past tests, and we haven't been able to devise a test that differentiates them, it makes sense to go with the simpler solution, which is !45, so I agree with @ultimike.

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    17 pass
  • Status changed to Fixed over 1 year ago
  • 🇺🇸United States markie Albuquerque, NM

    I am still trying to remember why I put that function in the codebase in the first place. But that being said, I have reviewed both MR's and think that !45 is the cleaner solution. I am going to go ahead and merge it. Thanks for all the great work and discussions.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @markie! :) Happy to see this fixed.

  • 🇺🇸United States ultimike Florida, USA

    Woot!

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024