Xss:filter() malforms inline image references with data uri scheme

Created on 9 January 2024, 11 months ago

Problem/Motivation

Inline images with data uri scheme becomes malformed after Xss filtering.

Steps to reproduce

You do not need to configure a filter format with filter_html filter in use, just pass the following string to Xss:filter() directly.

<img src="
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">

(Source: https://www.websiteoptimization.com/speed/tweak/inline-images/)

The output is:

<img src="image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">

Proposed resolution

Follow Mozilla's guidelines and support inline images with โ€œdata:image/*โ€ unless itโ€™s โ€œdata:image/svg+xmlโ€.

https://blog.mozilla.org/security/2017/11/27/blocking-top-level-navigati...

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

๐Ÿ› Bug report
Status

Active

Version

11.0 ๐Ÿ”ฅ

Component
Filterย  โ†’

Last updated 1 day ago

No maintainer
Created by

๐Ÿ‡ญ๐Ÿ‡บHungary mxr576 Hungary

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @mxr576
  • Status changed to Needs review 11 months ago
  • ๐Ÿ‡ญ๐Ÿ‡บHungary mxr576 Hungary
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    Always wary of changing this code but this looks like a sensible and self-contained change, however I added a question to the MR about applying the new condition.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    Out of scope for this, but I wonder if there is a way of reusing the Masterminds HTML5 parser in the XSS filter instead of having separate code that attempts to parse HTML attributes.

  • Status changed to Needs work 11 months ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    Added lots more comments that I thought of, mostly around the regex, as these can be brittle and there might be some sneaky ways around it.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada Charlie ChX Negyesi ๐ŸCanada

    @longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada dalin Guelph, ๐Ÿ‡จ๐Ÿ‡ฆ, ๐ŸŒ

    Follow Mozilla's guidelines and support inline images with โ€œdata:image/*โ€ unless itโ€™s โ€œdata:image/svg+xmlโ€.

    I recommend we go the other way and use a whitelist of known safe mimetypes. Who knows what the future brings. Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    #8 seems like a good idea over having a regex that contains exceptions.

  • ๐Ÿ‡ญ๐Ÿ‡บHungary mxr576 Hungary

    Thanks for all the valuable feedback, everyone!

    Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

    Well, should not this be the concern of browsers rather than us? I used the Mozilla blog post as a source because I found no other, but if anybody knows what the current logic is that Firefox/Chrome/Safari or others use to decide safe data URI-s for images, please share.

    @longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss for Drupal 4.5.6/4.6.4 WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.

    Yeah.. as part of my investigation, I also concluded that this code is so old... and it made me wonder if and when we should adopt symfony/html-sanitizer, was that already considered? (Again, Drupal core issue search yielded no result.)

    I recommend we go the other way and use a whitelist of known safe mimetypes.

    Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.

  • ๐Ÿ‡ญ๐Ÿ‡บHungary mxr576 Hungary
  • ๐Ÿ‡ญ๐Ÿ‡บHungary mxr576 Hungary

    I recommend we go the other way and use a whitelist of known safe mimetypes.

    Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.

    Maybe this MDN doc could provide us with a sane default list?
    https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#supported_...

    If this is the direction we should take, should we allow overriding this list with a new parameter in \Drupal\Component\Utility\Xss::filter()?

  • ๐Ÿ‡ญ๐Ÿ‡บHungary mxr576 Hungary

    Opened โœจ Adopt Symfony's HTML Sanitizer Active as a potential follow up.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada joseph.olstad

    We're still using D10.2.x in a project and need to roll the MR for 10.2.x, I'll give it a go!

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada joseph.olstad

    Before patching:

    Broken image in rich_text filtered content when all attributes are allowed for the img element.

    After patching and rebuilding caches:

    Image is displayed as expected.

    Thank you for this fix, it is an important one!

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom alexpott ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ

    One problem with the current MR is that it only accounts for src="stuff" and not src='stuff' and src=stuffnospaces which are what the two following blocks of code account for.

    Also the discussion seems to be to change the list to an allow list rather than and exception list but it got rtbc'd in the current state.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom alexpott ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ

    Interestingly Symfony's sanitizer allows data on all media urls see https://github.com/symfony/html-sanitizer/blob/a25620fc6407e14331f3c0c56... and doesn't do anything special for SVGs. I tried inlining an SVG with JS using a data url and the JS is not executed.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States tr Cascadia

    Also the discussion seems to be to change the list to an allow list rather than and exception list but it got rtbc'd in the current state.

    I assume you are referring to #11 by @dalin:

    Follow Mozilla's guidelines and support inline images with โ€œdata:image/*โ€ unless itโ€™s โ€œdata:image/svg+xmlโ€.

    I recommend we go the other way and use a whitelist of known safe mimetypes. Who knows what the future brings. Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

    So let me summarize:

    Right now, there are NO inline image formats supported. That is not acceptable. For things like email we NEED this. One of my modules, https://www.drupal.org/project/barcodes โ†’ , also needs this.

    Whitelists don't scale well. There are dozens of image formats, and more are being added all the time. If we use a whitelist we will always be out of date and there will constantly be feature requests to add new formats.

    BUT, as a way forward, how about we add some common, known formats right now? To cover the most common use cases. Trying to come up with a comprehensive list is a futile exercise which will delay this capability and won't prevent us from getting all those feature requests for additional formats.

    And if we're going to do a whitelist, perhaps it should be done in UrlHelper::filterBadProtocol()?

Production build 0.71.5 2024