Xss:filter() malforms inline image references with data uri scheme

Created on 9 January 2024, 10 months ago

Problem/Motivation

Inline images with data uri scheme becomes malformed after Xss filtering.

Steps to reproduce

You do not need to configure a filter format with filter_html filter in use, just pass the following string to Xss:filter() directly.

<img src="
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">

(Source: https://www.websiteoptimization.com/speed/tweak/inline-images/)

The output is:

<img src="image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">

Proposed resolution

Follow Mozilla's guidelines and support inline images with β€œdata:image/*” unless it’s β€œdata:image/svg+xml”.

https://blog.mozilla.org/security/2017/11/27/blocking-top-level-navigati...

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

πŸ› Bug report
Status

Active

Version

11.0 πŸ”₯

Component
FilterΒ  β†’

Last updated about 12 hours ago

No maintainer
Created by

πŸ‡­πŸ‡ΊHungary mxr576 Hungary

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @mxr576
  • Status changed to Needs review 10 months ago
  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Always wary of changing this code but this looks like a sensible and self-contained change, however I added a question to the MR about applying the new condition.

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Out of scope for this, but I wonder if there is a way of reusing the Masterminds HTML5 parser in the XSS filter instead of having separate code that attempts to parse HTML attributes.

  • Status changed to Needs work 10 months ago
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Added lots more comments that I thought of, mostly around the regex, as these can be brittle and there might be some sneaky ways around it.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    @longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.

  • πŸ‡¨πŸ‡¦Canada dalin Guelph, πŸ‡¨πŸ‡¦, 🌍

    Follow Mozilla's guidelines and support inline images with β€œdata:image/*” unless it’s β€œdata:image/svg+xml”.

    I recommend we go the other way and use a whitelist of known safe mimetypes. Who knows what the future brings. Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    #8 seems like a good idea over having a regex that contains exceptions.

  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary

    Thanks for all the valuable feedback, everyone!

    Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

    Well, should not this be the concern of browsers rather than us? I used the Mozilla blog post as a source because I found no other, but if anybody knows what the current logic is that Firefox/Chrome/Safari or others use to decide safe data URI-s for images, please share.

    @longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss for Drupal 4.5.6/4.6.4 WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.

    Yeah.. as part of my investigation, I also concluded that this code is so old... and it made me wonder if and when we should adopt symfony/html-sanitizer, was that already considered? (Again, Drupal core issue search yielded no result.)

    I recommend we go the other way and use a whitelist of known safe mimetypes.

    Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.

  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary
  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary

    I recommend we go the other way and use a whitelist of known safe mimetypes.

    Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.

    Maybe this MDN doc could provide us with a sane default list?
    https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#supported_...

    If this is the direction we should take, should we allow overriding this list with a new parameter in \Drupal\Component\Utility\Xss::filter()?

  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary

    Opened ✨ Adopt Symfony's HTML Sanitizer Active as a potential follow up.

  • πŸ‡¨πŸ‡¦Canada joseph.olstad

    We're still using D10.2.x in a project and need to roll the MR for 10.2.x, I'll give it a go!

  • πŸ‡¨πŸ‡¦Canada joseph.olstad

    Before patching:

    Broken image in rich_text filtered content when all attributes are allowed for the img element.

    After patching and rebuilding caches:

    Image is displayed as expected.

    Thank you for this fix, it is an important one!

Production build 0.71.5 2024