XSS attribute handling mangles attribute names with underscores

Created on 18 January 2017, almost 8 years ago
Updated 1 August 2024, 4 months ago

Hi guys,

Is there any good reason why _filter_xss_attributes can't take care of attributes having underscores inside ?

Actually I built a CKEditor plugin allowing to add widgets. That plugin consists in a textarea where we should input a widget with something like :
<custom-widget custom_attr="ko" custom-attr="ok"></custom-widget>
Which results in inserting a token in the CKEditor textarea
[{widget:<custom-widget custom_attr="ko" custom-attr="ok"></custom-widget>}]

I am using the "Limit allowed HTML tags" filter (and still need that).
Problem is : when _filter_xss_attributes is called it alter my custom attributes containing an underscore.
Instead of seeing
<custom-widget custom_attr="ko" custom-attr="ok"></custom-widget>
I see
<custom-widget attr="ko" custom-attr="ok"></custom-widget>

I guess I am an isolated case but still... I made a quick patch for that feature request just in case some folks would need that as well.

Ho and by the way, if the inserted HTML was mine I would rather use hyphens instead of underscores.... but well, the HTML has been generated and needs to have these underscores !

πŸ› Bug report
Status

Active

Version

11.0 πŸ”₯

Component
RenderΒ  β†’

Last updated 3 days ago

Created by

πŸ‡«πŸ‡·France MacSim

Live updates comments and jobs are added and updated live.
  • Needs backport to D7

    After being applied to the 8.x branch, it should be considered for backport to the 7.x branch. Note: This tag should generally remain even after the backport has been written, approved, and committed.

  • Needs issue summary update

    Issue summaries save everyone time if they are kept up-to-date. See Update issue summary task instructions.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Status changed to Active 8 months ago
  • πŸ‡³πŸ‡ΏNew Zealand RoSk0 Wellington
  • πŸ‡³πŸ‡ΏNew Zealand wiifm Wellington, NZ
  • πŸ‡³πŸ‡ΏNew Zealand quietone

    The code in question was first committed in

    commit 2d0df351d704ba34d191831f7d4b8cb385555be2
    Author: Dries Buytaert <dries@buytaert.net>
    Date:   Tue Nov 29 20:17:10 2005 +0000
    
        - Various fixes.  Updated CHANGELOG.txt
    

    The most recent change was in #2609928: Xss::attributes() mangles valid attribute names containing numbers β†’ .

    We (@RoSk0, @wiifm and myself) researched the issue to determine if underscores are allowed characters in the attribute name. This section of Extensible Markup Language (XML) 1.0 (Fifth Edition) states that they are allowed. Therefor, I am changing this to active.

    @RoSk0 and @wiifm

  • πŸ‡¦πŸ‡ΊAustralia klonos 90% Melbourne, Australia - 10% Larissa, Greece

    I understand that this issue here was raised to explicitly allow underscores in attribute names, however, I would like to point out the following from https://www.w3.org/TR/REC-xml/#NT-Name

    • Attribute names should follow the NameStartChar (NameChar)* format
    • NameChar may contain dashes, dots, mid-dots, numbers (among other characters), but it does NOT explicitly state allowing underscores
    • also, NameStartChar (the first character of attribute names) must NOT be a dash - it can however be a colon ( : ) or an underscore.

    So unless I'm interpreting that wrong, the current regex is incorrect, as it allows the first character of attributes to be a dash. So if we want to allow underscores, they should only be allowed as the first character of the attribute name.

    So unless we don't want to follow the standard strictly (in order to allow the use case for which this issue here was opened), the regex should be like so instead:
    [_a-zA-Z][-a-zA-Z0-9]*

    The above:

    • allows the first letter of the attribute name to be an underscore, but NOT a dash
    • allows any character following the first character to be a dash, but NOT an underscore

    Not sure about how people feel about the rest of the characters (colons, dots, mid-dots etc.). They seem edge cases to me, and best left to be discussed in a follow-up issue, in order to avoid derailing this one here.

    my 2c.

  • πŸ‡«πŸ‡·France MacSim

    @klonos the doc you're refering to is about XML attributes - we're hereby talking about (X)HTML attributes.

    https://html.spec.whatwg.org/multipage/infrastructure.html#xml

    Attribute names are said to be XML-compatible if they match the Name production defined in XML and they contain no U+003A COLON characters

    My interpretation is that (X)HTML attributes can be XML-compatible but it's not an obligation.

  • πŸ‡¦πŸ‡ΊAustralia klonos 90% Melbourne, Australia - 10% Larissa, Greece

    Thanks for the link to the HTML spec @MacSim.

    FTR, I am not opposed to this request. If the intended use of _filter_xss_attributes() is to filter XSS attempts (as the name of the function implies) rather than making sure that we adhere to the standards/specs, then all the more.

    The reasons why I mentioned and referred to https://www.w3.org/TR/REC-xml/#NT-Name was that a) @quietone linked to https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name in the most recent comment in this thread before mine, and b) the issue summary has examples of custom tags (<custom-widget> ), which is not a thing in HTML AFAIK.

    Either way, it would be nice to add a comment in the function or its docblock clarifying these things (i.e. not trying to adhere to specs, and trying to cover a mix of what is allowed/possible in XML as well as HTML/XHTML).

Production build 0.71.5 2024