FilterHtmlImageSecure serializes all content, even when it has no images

Created on 9 February 2016, over 8 years ago
Updated 22 February 2023, over 1 year ago

The _filter_html_image_secure_process() filter in FilterHtmlImageSecure checks for image tags in content. Is short, it does this:

<?php
$html_dom = Html::load($text);
$images = $html_dom->getElementsByTagName('img');
foreach ($images as $image) {
  // Perform checks and replacements.
}
$text = Html::serialize($html_dom);
return $text;
?>

The Html::serialize is always performed, even if the content does not have images. This leads to double serializing when the 'Correct faulty and chopped off HTML' filter is enabled as well, which currently results in HTML like this: text. Note the duplicate xml:lang attributes. The double attribute tag bug is tackled in another issue ( 🌱 [Meta] PHP DOM (libxml2) misinterprets HTML5 Active ), but while debugging πŸ› [PP-1] 'Restrict images to this site' leads to incorrect HTML with language attribute Closed: outdated , I found this issue.

πŸ› Bug report
Status

Postponed: needs info

Version

10.1 ✨

Component
FilterΒ  β†’

Last updated 1 day ago

No maintainer
Created by

πŸ‡³πŸ‡±Netherlands BarisW Amsterdam

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡³πŸ‡ΏNew Zealand DanielVeza Brisbane, AU

    I was rerolling this because on the surface it does make sense to skip all the processing if there is no images.

    I had one thought while I was doing it that if we aren't calling Html::serialize anymore, things like script won't be stripped anymore since that happens in the serialize function. So if there is a text format with only this filter enabled, scripts will get through.

    In which case we could change the early return to be return Html::serialize($text);. But I'm not sure if thats worth doing since the code skipped by the early return would be minimal.

    Interested in hearing thoughts.

Production build 0.69.0 2024