XSS attribute filtering is inconsistent and strips valid attributes

Created on 2 August 2015, over 9 years ago
Updated 17 February 2023, about 2 years ago

Follow-up to #2105841: Xss filter() mangles image captions and title/alt/data attributes โ†’

Problem/Motivation

\Drupal\Component\Utility\Xss::filter() cleans potentially dangerous protocols like "javascript:" from element attributes. It does this by stripping any set of characters that ends with a colon, unless the string is "http:" or "https:".

The filter strips out valid attribute name/value combinations that provide RDF metadata, such as rel="schema:author" or property="foaf:name".

Some attributes are exempt from this treatment, including `alt`, `title`, and any `data-*` attribute. In #2105841: Xss filter() mangles image captions and title/alt/data attributes โ†’ , the decision was made to hard-code the exempt attributes list, and possibly make the list configurable in a follow-up issue.

Proposed resolution

Create 3 lists.
1 for Rdf attributes
1 for safe attributes (ones we can skip protocol check)
1 for unsafe attributes (ones we enforce a protocol check for)

Remaining tasks

Patch review/refinement
Security review

User interface changes

None.

API changes

None.

Beta phase evaluation

-->

๐Ÿ› Bug report
Status

Needs review

Version

10.1 โœจ

Component
Baseย  โ†’

Last updated about 14 hours ago

Created by

๐Ÿ‡บ๐Ÿ‡ธUnited States les lim

Live updates comments and jobs are added and updated live.
  • Security

    It is used for security vulnerabilities which do not need a security advisory. For example, security issues in projects which do not have security advisory coverage, or forward-porting a change already disclosed in a security advisory. See Drupalโ€™s security advisory policy for details. Be careful publicly disclosing security vulnerabilities! Use the โ€œReport a security vulnerabilityโ€ link in the project pageโ€™s sidebar. See how to report a security issue for details.

  • Triaged core major

    There is consensus among core maintainers that this is a major issue. Only core committers should add this tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Closed ๐Ÿ› Xss::filter() does not handle HTML tags inside attribute values Closed: duplicate as a duplicate.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom lazysoundsystem

    The last patch is now a few lines off

    Checking patch core/lib/Drupal/Component/Utility/Xss.php...
    Checking patch core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php...
    Hunk #1 succeeded at 403 (offset 8 lines).
    Checking patch core/tests/Drupal/Tests/Component/Utility/XssTest.php...
    Hunk #3 succeeded at 538 (offset 10 lines).
    Applied patch core/lib/Drupal/Component/Utility/Xss.php cleanly.
    Applied patch core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php cleanly.
    Applied patch core/tests/Drupal/Tests/Component/Utility/XssTest.php cleanly.
    

    Updated patch changes the line numbers and nothing else.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Just checked #146 and it applied fine to 10.1

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Rerolled

    error: patch failed: core/lib/Drupal/Component/Utility/Xss.php:267
    error: core/lib/Drupal/Component/Utility/Xss.php: patch does not apply
    error: patch failed: core/tests/Drupal/Tests/Component/Utility/XssTest.php:525
    error: core/tests/Drupal/Tests/Component/Utility/XssTest.php: patch does not apply

  • Status changed to Needs work about 2 years ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia larowlan ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ.au GMT+10
    1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -29,6 +29,50 @@ class Xss {
      +   * The default list of safe attributes untouched by filter().
      

      nit - untouched => not sanitized ?

    2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -29,6 +29,50 @@ class Xss {
      +   * The default list of Unsafe attributes that must be touched by filter().
      

      ubernit - touched => sanitized ?

    3. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -232,13 +277,13 @@ protected static function attributes($attributes) {
      +              substr($attribute_name, 0, 3) === 'ng-' ||
      

      Can we get a comment for this, I assume its something angular related

    4. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -271,7 +316,15 @@ protected static function attributes($attributes) {
      +            if ($enforce_protocol_filtering) {
      +              $value = static::filterProtocol($attribute_name, $match[1]);
      +            }
      +            elseif ($skip_protocol_filtering) {
      +              $value = $match[1];
      +            }
      +            else {
      +              $value = static::filterProtocol($attribute_name, $match[1]);
      +            }
      

      I think we can simplify this to

      if ($enforce_protocol_filtering || !$skip_protocol_filtering) {
        // filter here
      }
      else {
        // skip here
      }
      
      

      applies in all three instances of this new code

    5. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -340,6 +409,26 @@ protected static function needsRemoval($html_tags, $elem) {
      +  protected static function filterProtocol($name, $value) {
      

      We can typehint here, both for arguments and return type

    6. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -340,6 +409,26 @@ protected static function needsRemoval($html_tags, $elem) {
      +    if (in_array($name, static::$safeAttributes)) {
      +      return preg_match('/^[a-zA-Z0-9]+\:[a-zA-Z0-9]+$/', $value) ? $value : UrlHelper::stripDangerousProtocols($value);
      

      can we get a comment here - its a safe attribute, but we're stripping something if a regex matches?

      could we document the regex a bit like we do in \Drupal\Component\Utility\Xss::filter where we have this

      // Strip any tags that are not in the list of allowed html tags.
          return preg_replace_callback('%
            (
            <(?=[^a-zA-Z!/])  # a lone <
            |                 # or
            <!--.*?-->        # a comment
            |                 # or
            <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
            |                 # or
            >                 # just a >
            )%x', $splitter, $string);
      

      Just as a courtesy to our future selves

    7. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
      @@ -395,15 +395,15 @@ public function providerTestFilterXss() {
      +    $data[] = ['<META HTTP-EQUIV="refresh" CONTENT="0;url=javascript:alert(\'XSS\');">', '<META http-equiv="refresh" content="0;url=javascript:alert(\'XSS\');">'];
      

      Whilst content is not executed, for a http-equiv="refresh" it is

      I tested this in Chrome and it was blocked by the browser, but it was executed - so I think we may need to handle this differently

    8. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
      @@ -395,15 +395,15 @@ public function providerTestFilterXss() {
      +    $data[] = ['<META HTTP-EQUIV="refresh" CONTENT="0;url=data:text/html base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">', '<META http-equiv="refresh" content="0;url=data:text/html base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">'];
      

      Same here, the browser evaluated it but blocked it in Firefox and Chrome

    9. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -543,6 +546,104 @@ public function providerTestAttributes() {
      +        '<h2 property="javascript:alert(0);">The Title</h2>',
      

      In my testing this was safe, browsers ignored it

    10. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -543,6 +546,104 @@ public function providerTestAttributes() {
      +        '<section class="actions-menu-inner" ng-style="{ \'max-height\': maxPanelHeight }" ed-pretty-scrollbar ed-scroll-axis="y" ed-scroll-theme="light"></section>',
      

      shouldn't the ed- prefixed attributes be removed?

    11. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -543,6 +546,104 @@ public function providerTestAttributes() {
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
      +        'Image tag with RDFa with namespaced attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        'Image tag with RDFa with bad with namespaced attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
      +        '<img src="http://example.com/foo.jpg" foo="baz">',
      +        'Image tag with non-RDFa attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<h2 property="title">The Title</h2>',
      +        '<h2 property="title">The Title</h2>',
      +        'H2 tag with RDFa attribute without namespace',
      +        ['h2'],
      +      ],
      +      [
      +        '<h2 property="http://purl.org/dc/terms/title">The Title</h2>',
      +        '<h2 property="http://purl.org/dc/terms/title">The Title</h2>',
      +        'H2 tag with RDFa attribute with URL',
      +        ['h2'],
      +      ],
      +      [
      +        '<h2 property="javascript:alert(0);">The Title</h2>',
      +        '<h2 property="javascript:alert(0);">The Title</h2>',
      +        'H2 tag with RDFa attribute with XSS',
      +        ['h2'],
      

      these look to be duplicated in the test cases?

  • Assigned to smustgrave
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Will work on this tomorrow. If I donโ€™t post anything in 3 days feel free to remove my name and work on.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    #152

    1 = Fixed
    2 = Fixed
    3 = Added comment
    4 = Updated
    5 = Fixed
    6 = tried my best
    7 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
    8 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
    9 = should it be removed?
    10 = removed
    11 = removed

  • Status changed to Needs review about 2 years ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Per a discussion with @larowlan removing "content" as one of the safe attributes and opened โœจ Investigate allowing "content" as allowed tag Active

    Updated patch

  • Issue was unassigned.
  • Status changed to Needs work about 2 years ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia larowlan ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ.au GMT+10
    1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -29,6 +29,49 @@ class Xss {
      +    'rel',
      +    'rev',
      +    'about',
      +    'datatype',
      +    'datatype_callback',
      +    'datetime',
      +    'mailto',
      +    'media',
      +    'sizes',
      

      We don't have test coverage for all of these attributes - should we add it here? I realise there's some overlap with ๐Ÿ› Xss::filterAdmin() incorrectly filters datetime attribute Fixed

    2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
      @@ -229,16 +273,16 @@ protected static function attributes($attributes) {
      +              substr($attribute_name, 0, 3) === 'ng-' ||
      

      I think this was missed from my earlier review, can we get a comment regarding the significance of `ng-` - e.g. relates to Angular (I think?)

  • Status changed to Needs review about 2 years ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    rel = is already included
    rev = tests added
    datetime = test added
    mailto = test added
    media = test added
    sizes = test added

    Removed some as I couldn't find examples
    about = Removed
    datatype = Removed
    datatype_callback = Removed

    #158.2 added to comment block in earlier patch.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Opened ๐Ÿ“Œ Considering making the XSS safe/unsafe attribute lists extendable Active as an additional follow-up, took the RDF list out of the issue summary since that's no long in the patch.

  • Status changed to Needs work almost 2 years ago
  • Status changed to Needs review almost 2 years ago
  • last update almost 2 years ago
    29,393 pass
  • Status changed to Needs work almost 2 years ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia larowlan ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ.au GMT+10

    Getting close, couple of questions and looks like at least one test case has the wrong label

    1. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        'Link tag with rev attribute',
      

      I don't see a rev attribute here

    2. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        'Link tag with mailto href',
      

      ah here is is, I think these titles are out of order

    3. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        '<a media="print and (resolution:300dpi)">Drupal</a>',
      +        '<a media="print and (resolution:300dpi)">Drupal</a>',
      +        'Link tag with media attribute',
      +        ['a'],
      +      ],
      +      [
      +        '<a sizes="16x16">Drupal</a>',
      +        '<a sizes="16x16">Drupal</a>',
      

      are sizes and media valid attributes for an a tag? should we use img here or picture or something that reflects supported attributes?

    4. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -507,6 +510,36 @@ public function providerTestAttributes() {
      +        '<time datetime="2017-02-14">',
      +        '<time datetime="2017-02-14">',
      +        'Time with datetime attribute',
      

      should we add a test-case with a : or did the related issue already resolve that? if so do we need this test case at all anymore?

    5. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
      @@ -561,6 +594,68 @@ public function providerTestAttributes() {
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
      +        'Image tag with RDFa with bad with namespaced attribute',
      +        ['img'],
      +      ],
      +      [
      +        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
      +        '<img src="http://example.com/foo.jpg" foo="baz">',
      +        'Image tag with non-RDFa attribute',
      

      I don't understand why one of these is stripped but the other isn't?

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    Not only attribute values are damaged, also attribute names.
    Attribute aaa:bbb="X" becomes bbb="X".
    Attribute aaa_bbb="X" becomes bbb="X".

    E.g. try Xss::filter('<div aaa:bbb="AAA-BBB" bbb="BBB">&nbsp;</div>', Xss::getAdminTagList()).
    This leads to a duplicate attribute `bbb=`.

    You can argue that these are silly or invalid attribute names.
    If this is the concern, the correct behavior would be to remove the attribute, _not_ to change the attribute name.

    Interesting discussion: "What characters are allowed in an HTML attribute name?", https://stackoverflow.com/a/53563849/246724

    Some things to consider.

    • Custom html tags can have any custom attributes.
    • Browsers typically accept custom attribute names even on basic html tags.
    • Some legacy or poorly designed 3rd party systems might rely on poorly named attributes.
      In fact this is the case for me in a current project.

    The culprit is definitely the attribute splitter function.
    But to fix it, we need to define the desired behavior.

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    Review:

                if ($enforce_protocol_filtering || !$skip_protocol_filtering) {
    

    These two variables always occur in this combination.
    They can be combined into a single variable.

                $filter_protocols = in_array($attribute_name, static::$unSafeAttributes)
                  || !(str_starts_with($attribute_name, 'data-')
                    || str_starts_with($attribute_name, 'ng-')
                    || in_array($attribute_name, static::$safeAttributes)
                  );
    [...]
                if ($filter_protocols) {
    

    We also see that, with their default values, static::$unSafeAttributes has no effect whatsoever.
    It would only have an effect if somebody adds a "data-" or "ng-" attribute to the unsafe list.

    I wonder, do we support people modifying these class properties? Or could we use constants instead?

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    I wonder, do we support people modifying these class properties? Or could we use constants instead?

    If we support them being modified, I would rather turn all the static methods to object methods, so that you can have different instances with different configuration.
    For static methods, I would expect them to behave the same at all times.

  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany donquixote

    Also,

      protected static function filterProtocol(string $name, string $value): string {
        // If the attribute is a safe attribute check that it doesn't contain a
        // protocol. If it does call UrlHelper::stripDangerousProtocols().
        if (in_array($name, static::$safeAttributes)) {
    

    We now have two places where we compare the attribute name to a list:
    One time in ::attributes() with case 0, another time in ::filterProtocol().
    This can be collapsed to just one place.
    Instead of passing the name to ::filterProtocol(), we can pass a filter mode.

    But maybe the name parameter in ::filterProtocol() is meant for subclasses that want custom logic to determine the filter mode?
    Do we care about inheritance?

  • Pipeline finished with Failed
    over 1 year ago
    Total: 177s
    #68494
  • First commit to issue fork.
  • Pipeline finished with Failed
    over 1 year ago
    Total: 161s
    #72050
  • Pipeline finished with Failed
    about 1 year ago
    #90872
  • First commit to issue fork.
  • Pipeline finished with Failed
    about 1 year ago
    Total: 178s
    #95413
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States neclimdul Houston, TX

    despite several requests for comment, its still not clear why we need a special case for "ng-"

    If we do need to special case it, hard coding it like this in its current location has a smell and suggests to me we haven't succeeded in making this "consistent" if such a special case for a framework is required.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States tommyk

    While I haven't read every reply, I didn't see any results when I searched for ARIA in this issue, so adding my use case.

    I got here because I'm trying to manipulate Views output to create accessible Read More links like described at https://www.w3.org/WAI/WCAG21/Techniques/aria/ARIA8 wherein I am adding an aria-label attribute with a token like <a href="/node/{{ nid }}" aria-label="Read more about {{ title }}">Read More</a> where the title in this case contains a colon. Everything in the ARIA Label up to and including the colon in the title is stripped.

  • Pipeline finished with Success
    8 months ago
    Total: 349s
    #268233
  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand quietone

    Hide files

  • ๐Ÿ‡ต๐Ÿ‡นPortugal joao.ramos.costa

    HI @ tommyk,

    indeed. Got here by the same reason. The aria-* attributes may be 'whitelisted' as for the same reason 'data-' attributes are and perhaps a little more than 'ng-'.

  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland onnia

    Hi,
    I could not find 10.4.x compatible patch for commit 3cf003db
    so I made my own, I hope someone else finds this useful.

Production build 0.71.5 2024