- Status changed to Active
8 months ago 4:11am 22 March 2024 - π³πΏNew Zealand quietone
The code in question was first committed in
commit 2d0df351d704ba34d191831f7d4b8cb385555be2 Author: Dries Buytaert <dries@buytaert.net> Date: Tue Nov 29 20:17:10 2005 +0000 - Various fixes. Updated CHANGELOG.txt
The most recent change was in #2609928: Xss::attributes() mangles valid attribute names containing numbers β .
We (@RoSk0, @wiifm and myself) researched the issue to determine if underscores are allowed characters in the attribute name. This section of Extensible Markup Language (XML) 1.0 (Fifth Edition) states that they are allowed. Therefor, I am changing this to active.
@RoSk0 and @wiifm
- π¦πΊAustralia klonos 90% Melbourne, Australia - 10% Larissa, Greece
I understand that this issue here was raised to explicitly allow underscores in attribute names, however, I would like to point out the following from https://www.w3.org/TR/REC-xml/#NT-Name
- Attribute names should follow the
NameStartChar (NameChar)*
format NameChar
may contain dashes, dots, mid-dots, numbers (among other characters), but it does NOT explicitly state allowing underscores- also,
NameStartChar
(the first character of attribute names) must NOT be a dash - it can however be a colon (:
) or an underscore.
So unless I'm interpreting that wrong, the current regex is incorrect, as it allows the first character of attributes to be a dash. So if we want to allow underscores, they should only be allowed as the first character of the attribute name.
So unless we don't want to follow the standard strictly (in order to allow the use case for which this issue here was opened), the regex should be like so instead:
[_a-zA-Z][-a-zA-Z0-9]*
The above:
- allows the first letter of the attribute name to be an underscore, but NOT a dash
- allows any character following the first character to be a dash, but NOT an underscore
Not sure about how people feel about the rest of the characters (colons, dots, mid-dots etc.). They seem edge cases to me, and best left to be discussed in a follow-up issue, in order to avoid derailing this one here.
my 2c.
- Attribute names should follow the
- π«π·France MacSim
@klonos the doc you're refering to is about XML attributes - we're hereby talking about (X)HTML attributes.
https://html.spec.whatwg.org/multipage/infrastructure.html#xml
Attribute names are said to be XML-compatible if they match the Name production defined in XML and they contain no U+003A COLON characters
My interpretation is that (X)HTML attributes can be XML-compatible but it's not an obligation.
- π¦πΊAustralia klonos 90% Melbourne, Australia - 10% Larissa, Greece
Thanks for the link to the HTML spec @MacSim.
FTR, I am not opposed to this request. If the intended use of _filter_xss_attributes() is to filter XSS attempts (as the name of the function implies) rather than making sure that we adhere to the standards/specs, then all the more.
The reasons why I mentioned and referred to https://www.w3.org/TR/REC-xml/#NT-Name was that a) @quietone linked to https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name in the most recent comment in this thread before mine, and b) the issue summary has examples of custom tags (
<custom-widget>
), which is not a thing in HTML AFAIK.Either way, it would be nice to add a comment in the function or its docblock clarifying these things (i.e. not trying to adhere to specs, and trying to cover a mix of what is allowed/possible in XML as well as HTML/XHTML).