XSS attribute filtering is inconsistent and strips valid attributes

Created on 2 August 2015, almost 9 years ago
Updated 6 March 2024, 3 months ago

Follow-up to #2105841: Xss filter() mangles image captions and title/alt/data attributes

Problem/Motivation

\Drupal\Component\Utility\Xss::filter() cleans potentially dangerous protocols like "javascript:" from element attributes. It does this by stripping any set of characters that ends with a colon, unless the string is "http:" or "https:".

The filter strips out valid attribute name/value combinations that provide RDF metadata, such as rel="schema:author" or property="foaf:name".

Some attributes are exempt from this treatment, including `alt`, `title`, and any `data-*` attribute. In #2105841: Xss filter() mangles image captions and title/alt/data attributes , the decision was made to hard-code the exempt attributes list, and possibly make the list configurable in a follow-up issue.

Proposed resolution

Create 2 lists.
1 for safe attributes (ones we can skip protocol check)
1 for unsafe attributes (ones we enforce a protocol check for)

Remaining tasks

Patch review/refinement
Security review

User interface changes

None.

API changes

None.

Beta phase evaluation

-->

🐛 Bug report
Status

Needs work

Version

11.0 🔥

Component
Base 

Last updated less than a minute ago

Created by

🇺🇸United States Les Lim

Live updates comments and jobs are added and updated live.
  • Security

    It is used for security vulnerabilities which do not need a security advisory. For example, security issues in projects which do not have security advisory coverage, or forward-porting a change already disclosed in a security advisory. See Drupal’s security advisory policy for details. Be careful publicly disclosing security vulnerabilities! Use the “Report a security vulnerability” link in the project page’s sidebar. See how to report a security issue for details.

  • Triaged core major

    There is consensus among core maintainers that this is a major issue. Only core committers should add this tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇺🇸United States smustgrave

    Closed 🐛 Xss::filter() does not handle HTML tags inside attribute values Closed: duplicate as a duplicate.

  • 🇬🇧United Kingdom lazysoundsystem

    The last patch is now a few lines off

    Checking patch core/lib/Drupal/Component/Utility/Xss.php...
    Checking patch core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php...
    Hunk #1 succeeded at 403 (offset 8 lines).
    Checking patch core/tests/Drupal/Tests/Component/Utility/XssTest.php...
    Hunk #3 succeeded at 538 (offset 10 lines).
    Applied patch core/lib/Drupal/Component/Utility/Xss.php cleanly.
    Applied patch core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php cleanly.
    Applied patch core/tests/Drupal/Tests/Component/Utility/XssTest.php cleanly.
    

    Updated patch changes the line numbers and nothing else.

  • 🇺🇸United States smustgrave

    Just checked #146 and it applied fine to 10.1

  • 🇺🇸United States smustgrave

    Rerolled

    error: patch failed: core/lib/Drupal/Component/Utility/Xss.php:267
    error: core/lib/Drupal/Component/Utility/Xss.php: patch does not apply
    error: patch failed: core/tests/Drupal/Tests/Component/Utility/XssTest.php:525
    error: core/tests/Drupal/Tests/Component/Utility/XssTest.php: patch does not apply

  • Status changed to Needs work about 1 year ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
    1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -29,6 +29,50 @@ class Xss {
      +   * The default list of safe attributes untouched by filter().
      

      nit - untouched => not sanitized ?

    2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -29,6 +29,50 @@ class Xss {
      +   * The default list of Unsafe attributes that must be touched by filter().
      

      ubernit - touched => sanitized ?

    3. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -232,13 +277,13 @@ protected static function attributes($attributes) {
      +              substr($attribute_name, 0, 3) === 'ng-' ||
      

      Can we get a comment for this, I assume its something angular related

    4. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -271,7 +316,15 @@ protected static function attributes($attributes) {
      +            if ($enforce_protocol_filtering) {
      +              $value = static::filterProtocol($attribute_name, $match[1]);
      +            }
      +            elseif ($skip_protocol_filtering) {
      +              $value = $match[1];
      +            }
      +            else {
      +              $value = static::filterProtocol($attribute_name, $match[1]);
      +            }
      

      I think we can simplify this to

      if ($enforce_protocol_filtering || !$skip_protocol_filtering) {
        // filter here
      }
      else {
        // skip here
      }
      
      

      applies in all three instances of this new code

    5. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -340,6 +409,26 @@ protected static function needsRemoval($html_tags, $elem) {
      +  protected static function filterProtocol($name, $value) {
      

      We can typehint here, both for arguments and return type

    6. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -340,6 +409,26 @@ protected static function needsRemoval($html_tags, $elem) {
      +    if (in_array($name, static::$safeAttributes)) {
      +      return preg_match('/^[a-zA-Z0-9]+\:[a-zA-Z0-9]+$/', $value) ? $value : UrlHelper::stripDangerousProtocols($value);
      

      can we get a comment here - its a safe attribute, but we're stripping something if a regex matches?

      could we document the regex a bit like we do in \Drupal\Component\Utility\Xss::filter where we have this

      // Strip any tags that are not in the list of allowed html tags.
          return preg_replace_callback('%
            (
            <(?=[^a-zA-Z!/])  # a lone <
            |                 # or
            <!--.*?-->        # a comment
            |                 # or
            <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
            |                 # or
            >                 # just a >
            )%x', $splitter, $string);
      

      Just as a courtesy to our future selves

    7. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
      @@ -395,15 +395,15 @@ public function providerTestFilterXss() {
      +    $data[] = ['<META HTTP-EQUIV="refresh" CONTENT="0;url=javascript:alert(\'XSS\');">', '<META http-equiv="refresh" content="0;url=javascript:alert(\'XSS\');">'];
      

      Whilst content is not executed, for a http-equiv="refresh" it is

      I tested this in Chrome and it was blocked by the browser, but it was executed - so I think we may need to handle this differently

    8. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
      @@ -395,15 +395,15 @@ public function providerTestFilterXss() {
      +    $data[] = ['<META HTTP-EQUIV="refresh" CONTENT="0;url=data:text/html base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">', '<META http-equiv="refresh" content="0;url=data:text/html base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">'];
      

      Same here, the browser evaluated it but blocked it in Firefox and Chrome

    9. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -543,6 +546,104 @@ public function providerTestAttributes() {
      +        '<h2 property="javascript:alert(0);">The Title</h2>',
      

      In my testing this was safe, browsers ignored it

    10. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -543,6 +546,104 @@ public function providerTestAttributes() {
      +        '<section class="actions-menu-inner" ng-style="{ \'max-height\': maxPanelHeight }" ed-pretty-scrollbar ed-scroll-axis="y" ed-scroll-theme="light"></section>',
      

      shouldn't the ed- prefixed attributes be removed?

    11. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -543,6 +546,104 @@ public function providerTestAttributes() {
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
      +        'Image tag with RDFa with namespaced attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        'Image tag with RDFa with bad with namespaced attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
      +        '<img src="http://example.com/foo.jpg" foo="baz">',
      +        'Image tag with non-RDFa attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<h2 property="title">The Title</h2>',
      +        '<h2 property="title">The Title</h2>',
      +        'H2 tag with RDFa attribute without namespace',
      +        ['h2'],
      +      ],
      +      [
      +        '<h2 property="http://purl.org/dc/terms/title">The Title</h2>',
      +        '<h2 property="http://purl.org/dc/terms/title">The Title</h2>',
      +        'H2 tag with RDFa attribute with URL',
      +        ['h2'],
      +      ],
      +      [
      +        '<h2 property="javascript:alert(0);">The Title</h2>',
      +        '<h2 property="javascript:alert(0);">The Title</h2>',
      +        'H2 tag with RDFa attribute with XSS',
      +        ['h2'],
      

      these look to be duplicated in the test cases?

  • Assigned to smustgrave
  • 🇺🇸United States smustgrave

    Will work on this tomorrow. If I don’t post anything in 3 days feel free to remove my name and work on.

  • 🇺🇸United States smustgrave

    #152

    1 = Fixed
    2 = Fixed
    3 = Added comment
    4 = Updated
    5 = Fixed
    6 = tried my best
    7 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
    8 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
    9 = should it be removed?
    10 = removed
    11 = removed

  • Status changed to Needs review about 1 year ago
  • 🇺🇸United States smustgrave

    Per a discussion with @larowlan removing "content" as one of the safe attributes and opened Investigate allowing "content" as allowed tag Active

    Updated patch

  • Issue was unassigned.
  • Status changed to Needs work about 1 year ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
    1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -29,6 +29,49 @@ class Xss {
      +    'rel',
      +    'rev',
      +    'about',
      +    'datatype',
      +    'datatype_callback',
      +    'datetime',
      +    'mailto',
      +    'media',
      +    'sizes',
      

      We don't have test coverage for all of these attributes - should we add it here? I realise there's some overlap with 🐛 Xss::filterAdmin() incorrectly filters datetime attribute Fixed

    2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -229,16 +273,16 @@ protected static function attributes($attributes) {
      +              substr($attribute_name, 0, 3) === 'ng-' ||
      

      I think this was missed from my earlier review, can we get a comment regarding the significance of `ng-` - e.g. relates to Angular (I think?)

  • Status changed to Needs review about 1 year ago
  • 🇺🇸United States smustgrave

    rel = is already included
    rev = tests added
    datetime = test added
    mailto = test added
    media = test added
    sizes = test added

    Removed some as I couldn't find examples
    about = Removed
    datatype = Removed
    datatype_callback = Removed

    #158.2 added to comment block in earlier patch.

  • 🇬🇧United Kingdom catch

    Opened 📌 Considering making the XSS safe/unsafe attribute lists extendable Active as an additional follow-up, took the RDF list out of the issue summary since that's no long in the patch.

  • Status changed to Needs work about 1 year ago
  • Status changed to Needs review about 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    29,393 pass
  • Status changed to Needs work about 1 year ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Getting close, couple of questions and looks like at least one test case has the wrong label

    1. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        'Link tag with rev attribute',
      

      I don't see a rev attribute here

    2. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        'Link tag with mailto href',
      

      ah here is is, I think these titles are out of order

    3. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        '<a media="print and (resolution:300dpi)">Drupal</a>',
      +        '<a media="print and (resolution:300dpi)">Drupal</a>',
      +        'Link tag with media attribute',
      +        ['a'],
      +      ],
      +      [
      +        '<a sizes="16x16">Drupal</a>',
      +        '<a sizes="16x16">Drupal</a>',
      

      are sizes and media valid attributes for an a tag? should we use img here or picture or something that reflects supported attributes?

    4. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        '<time datetime="2017-02-14">',
      +        '<time datetime="2017-02-14">',
      +        'Time with datetime attribute',
      

      should we add a test-case with a : or did the related issue already resolve that? if so do we need this test case at all anymore?

    5. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -561,6 +594,68 @@ public function providerTestAttributes() {
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        'Image tag with RDFa with bad with namespaced attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
      +        '<img src="http://example.com/foo.jpg" foo="baz">',
      +        'Image tag with non-RDFa attribute',
      

      I don't understand why one of these is stripped but the other isn't?

  • 🇩🇪Germany donquixote

    Not only attribute values are damaged, also attribute names.
    Attribute aaa:bbb="X" becomes bbb="X".
    Attribute aaa_bbb="X" becomes bbb="X".

    E.g. try Xss::filter('<div aaa:bbb="AAA-BBB" bbb="BBB">&nbsp;</div>', Xss::getAdminTagList()).
    This leads to a duplicate attribute `bbb=`.

    You can argue that these are silly or invalid attribute names.
    If this is the concern, the correct behavior would be to remove the attribute, _not_ to change the attribute name.

    Interesting discussion: "What characters are allowed in an HTML attribute name?", https://stackoverflow.com/a/53563849/246724

    Some things to consider.

    • Custom html tags can have any custom attributes.
    • Browsers typically accept custom attribute names even on basic html tags.
    • Some legacy or poorly designed 3rd party systems might rely on poorly named attributes.
      In fact this is the case for me in a current project.

    The culprit is definitely the attribute splitter function.
    But to fix it, we need to define the desired behavior.

  • 🇩🇪Germany donquixote

    Review:

                if ($enforce_protocol_filtering || !$skip_protocol_filtering) {
    

    These two variables always occur in this combination.
    They can be combined into a single variable.

                $filter_protocols = in_array($attribute_name, static::$unSafeAttributes)
                  || !(str_starts_with($attribute_name, 'data-')
                    || str_starts_with($attribute_name, 'ng-')
                    || in_array($attribute_name, static::$safeAttributes)
                  );
    [...]
                if ($filter_protocols) {
    

    We also see that, with their default values, static::$unSafeAttributes has no effect whatsoever.
    It would only have an effect if somebody adds a "data-" or "ng-" attribute to the unsafe list.

    I wonder, do we support people modifying these class properties? Or could we use constants instead?

  • 🇩🇪Germany donquixote

    I wonder, do we support people modifying these class properties? Or could we use constants instead?

    If we support them being modified, I would rather turn all the static methods to object methods, so that you can have different instances with different configuration.
    For static methods, I would expect them to behave the same at all times.

  • 🇩🇪Germany donquixote

    Also,

      protected static function filterProtocol(string $name, string $value): string {
        // If the attribute is a safe attribute check that it doesn't contain a
        // protocol. If it does call UrlHelper::stripDangerousProtocols().
        if (in_array($name, static::$safeAttributes)) {
    

    We now have two places where we compare the attribute name to a list:
    One time in ::attributes() with case 0, another time in ::filterProtocol().
    This can be collapsed to just one place.
    Instead of passing the name to ::filterProtocol(), we can pass a filter mode.

    But maybe the name parameter in ::filterProtocol() is meant for subclasses that want custom logic to determine the filter mode?
    Do we care about inheritance?

  • Pipeline finished with Failed
    6 months ago
    Total: 177s
    #68494
  • First commit to issue fork.
  • Pipeline finished with Failed
    5 months ago
    Total: 161s
    #72050
  • Pipeline finished with Failed
    4 months ago
    #90872
  • First commit to issue fork.
  • Pipeline finished with Failed
    4 months ago
    Total: 178s
    #95413
  • 🇺🇸United States neclimdul Houston, TX

    despite several requests for comment, its still not clear why we need a special case for "ng-"

    If we do need to special case it, hard coding it like this in its current location has a smell and suggests to me we haven't succeeded in making this "consistent" if such a special case for a framework is required.

Production build 0.69.0 2024