- Issue created by @Ewout Goosmann
- 🇳🇱Netherlands Ewout Goosmann
This patch should solve the deprecation warning about the deprecated function.
I have used https://www.php.net/manual/en/function.mb-convert-encoding.php#127529 to fix this.
- 🇳🇱Netherlands Ewout Goosmann
Reroll patch for version 2.0.x of this module.
- Status changed to Needs review
almost 2 years ago 11:31am 22 March 2023 - Status changed to Needs work
almost 2 years ago 12:13pm 24 March 2023 - 🇩🇪Germany Anybody Porta Westfalica
Confirming this issue!
@Ewout Goosmann could you perhaps prepare the patch as MR to make it easier for the maintainers to review and merge?
- $dom = Html::load(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8')); + $dom = Html::load(mb_encode_numericentity($html, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));
Looks very cryptic! Even if it works, it's hard to understand.
I'd suggest having a look at alternatives like: https://stackoverflow.com/questions/11974008/alternative-to-mb-convert-e...
- 🇺🇸United States ultimike Florida, USA
I'm going to suggest we do it the same way Symfony does.
Replace:
$dom = Html::load(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
with:
$dom = Html::load(htmlspecialchars_decode(iconv('UTF-8', 'ISO-8859-1', htmlentities($html, ENT_COMPAT, 'UTF-8')), ENT_QUOTES));
Anyone who is a novice want to create a patch or pull request?
-mike
- Assigned to akshaydalvi212
- 🇮🇳India akshaydalvi212
I will provide the patch as suggested by @ultimike.
- Issue was unassigned.
- Status changed to Needs review
almost 2 years ago 11:16am 27 March 2023 - 🇮🇳India akshaydalvi212
providing patch #8 file to solve the issue.
kindly review. - 🇩🇪Germany Anybody Porta Westfalica
Patch looks good. Can anyone explain, why this is needed at all? Drupal is UTF-8 entirely, isn't it?
- 🇺🇸United States ultimike Florida, USA
@akshaydalvi212 please refrain from creating patches for issues that are marked as "novice". Looking at your drupal.org profile, I think you've graduated from "novice" issues (congrats!)
@Anybody - I don't have a good answer for your question, but a deprecated function is a deprecated function...
-mike
- 🇩🇪Germany Anybody Porta Westfalica
@Anybody - I don't have a good answer for your question, but a deprecated function is a deprecated function...
I had a short look and it was introduced here 6 years ago: #2733381: UTF-8 encoding needed to show all characters correctly →
Still I wonder why it's needed and think we should understand, as Drupal 8+ IS UTF8 based. It is a deprecated function, but should we fix something we do not fully understand?The issue says:
Smart_trim uses loadHtml() which treats our text as ISO-8859-1 unless we tell it otherwise.
https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Util...
has<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
Perhaps that was fixed in core in the meantime?Also see
https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not...So at least we should try without it and write a test to ensure it works a as expected?
If not here, then as follow-up? The last submitted patch, 8: 3342481-8.patch, failed testing. View results →
- Status changed to Needs work
almost 2 years ago 12:02pm 29 March 2023 - 🇺🇸United States ultimike Florida, USA
@Anybody,
I can get behind this line of thinking. Changing (on my local)
$dom = Html::load(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
to
$dom = Html::load($html);
All tests still pass.
I'd like to add some additional test data to
TruncateHTMLTest::truncateCharsDataProvider()
andTruncateHTMLTest::truncateWordsDataProvider()
and then re-run the tests. Specifically, some sample test data from issues like 🐛 Error cropping Japanese Needs work would be fantastic.Sound like a plan forward?
-mike
- 🇺🇸United States ultimike Florida, USA
Here's a really useful resource for PHP and UTF-8 - this really helped me understand the issue better. It is also important to note (as @Anybody mentioned previously), that Drupal 8+ enforces UTF-8 in the database, so we don't need to worry about converting encodings.
https://phptherightway.com/#php_and_utf8
-mike
- First commit to issue fork.
- last update
over 1 year ago 7 pass, 2 fail - @rajeshreeputra opened merge request.
- last update
over 1 year ago 7 pass, 2 fail - 🇺🇸United States ultimike Florida, USA
@Rajeshreeputra thank you for converting the patch into an issue fork.
-mike
- Assigned to ankitv18
- last update
over 1 year ago 7 pass, 1 fail - last update
over 1 year ago 7 pass, 1 fail - last update
over 1 year ago 17 pass - last update
over 1 year ago 17 pass - last update
over 1 year ago 17 pass - Issue was unassigned.
- Status changed to Needs review
over 1 year ago 12:31pm 9 May 2023 - last update
over 1 year ago 17 pass - 🇮🇳India ankitv18
Hi @ultimike
With referring https://php.watch/versions/8.2/mbstring-qprint-base64-uuencode-html-enti... I've fixed the deprecation issue and now MR!43 is ready for a review. - Status changed to RTBC
over 1 year ago 5:27am 10 May 2023 - 🇮🇳India deepakkm
As PHP8.2 tests are passing we can merge this MR, also tested this locally with drupal-check.
- Status changed to Needs work
over 1 year ago 7:28am 11 May 2023 - 🇩🇪Germany Anybody Porta Westfalica
- last update
over 1 year ago 17 pass - @anybody opened merge request.
- last update
over 1 year ago 17 pass - last update
over 1 year ago 17 pass - Status changed to Needs review
over 1 year ago 7:37am 11 May 2023 - 🇩🇪Germany Anybody Porta Westfalica
If MR!45 goes green, I'd prefer that one. If anything breaks then, this means it needs an additional test for this case.
@ultimike what do you say?
- Status changed to Needs work
over 1 year ago 12:50pm 12 May 2023 - 🇺🇸United States ultimike Florida, USA
I definitely think we need to add a test for this change. Specifically, I would love to see an addition to the existing Unit test that fails before this change and passes after the change.
-mike
- 🇮🇪Ireland lostcarpark
It should be noted that without this patch, under PHP8.2, the following tests fail with a "Deprecated function" exception:
SmartTrimFunctionalTest::testTokenNotCutOffTrimTypeCharacters
SmartTrimFunctionalTest::testMoreLinkOnlyWhenContentIsTrimmedAs they pass in 8.2 with the patch, that would seem a good indicator that this patch is needed.
However, I think it's worth trying to develop a specific test to check for this. The ideal would be to find a string that mb_convert_encoding incorrectly translates, as it is not expecting it to already in UTF8 format. I will do some experimenting.
- 🇮🇪Ireland lostcarpark
Running just the the unit tests under 8.2 results in passing tests, but with the deprecation notice below:
Testing /var/www/html/modules/smart_trim-3342481/tests/src/Unit .......... 10 / 10 (100%) Time: 00:00.044, Memory: 4.00 MB OK (10 tests, 10 assertions) Remaining self deprecation notices (1) 1x: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead 1x in TruncateHTMLTest::testTruncateChars from Drupal\Tests\smart_trim\Unit
Changing to either merge request branch, the unit tests pass with no warning.
- 🇮🇪Ireland lostcarpark
As I understand it,
mb_convert_encoding
is converting from UTF8 content to HTML entities.If we go with MR#43,
htmlspecialchars_decode(htmlentities($html))
does exactly the same thing, so I'm not sure a new test will add anything. I think the string with entities gets converted back to UTF8 before it gets returned, otherwise I don't think the Armenia character test would pass.If we pick MR#45, we are dropping the conversion of UTF-8 to entities. If they would get converted back anyway, I'm not sure this conversion adds anything.
Another option I looked was using the
$this->expectDeprecated()
method, but I don't think this is useful, as we're removing the deprecated code from the codebase. I don't think there's a "expectNotDeprecated", since thats basically every test.I'm doubtful that a new test can distinguish the return value from the old method, but if anyone can suggest a way, I'm happy to try coding it.
- 🇩🇪Germany Anybody Porta Westfalica
@Rajeshreeputra that doesn't help. MR!43 is still the more (security & performance) risky one. So we have to compare and chose MR!43 if there are no good reasons to pick MR!45
- 🇮🇪Ireland lostcarpark
I agree with @Anybody. Code review needs to compare MR!43 with MR!45, and consider reasons for choosing one over the other.
Ideally, we should find a test that fails for one MR and passes for the other. If we can't find a test that differentiates them, then I think that would point to MR!45 as the simpler solution.
There's a suggestion in #22 that MR!43 could pose a security risk. I think we should focus on a test that exposes that risk. @Anybody, if you could elaborate on the possible risk, I'd be happy to work on a test to verify.
- 🇺🇸United States ultimike Florida, USA
Reviewing this issue with @lostcarpark, I am a bit confused about why I thought the code from my comment 6 above is/was from Symfony. It appears that I was incorrect - feel free to check me on this.
All that being said, I think the simpler solution !45 is best, unless we can prove otherwise.
@lostcarpark and I also tried to devise a test that would fail with
mb_convert_encoding()
, but were unable to. Therefore, I vote that we should merge !45.@markie - do you concur?
-mike
- 🇮🇪Ireland lostcarpark
As both solutions past tests, and we haven't been able to devise a test that differentiates them, it makes sense to go with the simpler solution, which is !45, so I agree with @ultimike.
- First commit to issue fork.
- last update
over 1 year ago 17 pass - Status changed to Fixed
over 1 year ago 8:10pm 24 May 2023 - 🇺🇸United States markie Albuquerque, NM
I am still trying to remember why I put that function in the codebase in the first place. But that being said, I have reviewed both MR's and think that !45 is the cleaner solution. I am going to go ahead and merge it. Thanks for all the great work and discussions.
- 🇩🇪Germany Anybody Porta Westfalica
Thanks @markie! :) Happy to see this fixed.
Automatically closed - issue fixed for 2 weeks with no activity.