Xss::filter() does not handle HTML tags inside attribute values

Created on 12 August 2021, over 3 years ago
Updated 17 February 2023, over 1 year ago

Problem/Motivation

Initially reported by @lauriii in πŸ› Upgrade filter system to HTML5 Fixed , HTML5 allows unescaped less-than and greater-than in HTML attributes, e.g.

<img src="llama.jpg" data-caption="<em>Loquacious llama!</em>" />

Xss::filter() does not handle this:

>>> use \Drupal\Component\Utility\Xss;

>>> Xss::filter('<img src="llama.jpg" data-caption="Loquacious llama!" />', ['img', 'em']);
=> "<img src="llama.jpg" data-caption="Loquacious llama!" />"

>>> Xss::filter('<img src="llama.jpg" data-caption="<em>Loquacious llama!</em>" />', ['img', 'em']);
=> "<img src="llama.jpg">Loquacious llama!</em>" /&gt;"

In other words when an attribute contains a tag (or even just a >) the output is mangled, and part of the attribute value may end up in the HTML body instead.

Xss::filter() uses two regular expressions to try and extract tags from HTML:

      <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string

This trivially matches anything that looks like a tag, but does not handle attributes that contain >.

    if (!preg_match('%^<\s*(/\s*)?([a-zA-Z0-9\-]+)\s*([^>]*)>?|(<!--.*?-->)$%', $string, $matches)) {

Similarly this seems unable to handle attributes that contain >.

Steps to reproduce

Proposed resolution

Remaining tasks

Determine whether regex is sufficient to filter HTML in this way: https://stackoverflow.com/a/1732454
Improve the regex to handle attributes that contain tag characters, or replace Xss::filter() with something more robust.

User interface changes

API changes

Data model changes

Release notes snippet

πŸ› Bug report
Status

Closed: duplicate

Version

10.1 ✨

Component
FilterΒ  β†’

Last updated about 5 hours ago

No maintainer
Created by

πŸ‡¬πŸ‡§United Kingdom longwave UK

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024