- Issue created by @mcdruid
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
All of these are available as optional settings. Are you saying we should default to safer settings, or enforce them?
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
I'd recommend that we enforce the removal (/ substitution) of the most dangerous characters.
For removal of spaces, that'd be ok as a default (would people really want to turn it off?!)
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
@kim.pepper although the existing options would remove all of the characters I highlighted as a concern, they do more than that.
I wouldn't advocate stripping filenames down to only ASCII / alpha-numeric characters (by default or as a mandatory setting) because of how restrictive that is for different languages.
I'm proposing a fairly limited number of forbidden characters - those which have significance for the shell - which would be a subset of what would be stripped by any of the existing options IIUC?
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
Having said that, I went to look for a definitive list of "dangerous characters" and ended up with quite a lot of "it depends".
This is not definitive, but e.g. https://stackoverflow.com/questions/15783701/which-characters-need-to-be...
There are some quite good lists using e.g. the shell escaping option that printf (and apparently ls) offer - e.g.
No, character % does not need to be escaped No, character + does not need to be escaped No, character - does not need to be escaped No, character . does not need to be escaped No, character / does not need to be escaped No, character : does not need to be escaped No, character = does not need to be escaped No, character @ does not need to be escaped No, character _ does not need to be escaped Yes, character needs to be escaped Yes, character ! needs to be escaped Yes, character " needs to be escaped Yes, character # needs to be escaped Yes, character $ needs to be escaped Yes, character & needs to be escaped Yes, character ' needs to be escaped Yes, character ( needs to be escaped Yes, character ) needs to be escaped Yes, character * needs to be escaped Yes, character , needs to be escaped Yes, character ; needs to be escaped Yes, character < needs to be escaped Yes, character > needs to be escaped Yes, character ? needs to be escaped Yes, character [ needs to be escaped Yes, character \ needs to be escaped Yes, character ] needs to be escaped Yes, character ^ needs to be escaped Yes, character ` needs to be escaped Yes, character { needs to be escaped Yes, character | needs to be escaped Yes, character } needs to be escaped
I feel like most of those make sense; and there are a fair number of characters that would be allowed.
Whether this would be more or less confusing / user friendly than just stripping all punctuation though is perhaps a usability question.
Looks like the code mostly uses regex; I wonder whether the
punct
character class would be any use. It'd remove the "safe" characters from the list above which we may not want to do.To be clear what I'm looking for here is something that would remove all the dangerous shell characters but not limit users to strict ASCII alphanum without e.g. accented letters etc..
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
FWIW by default Wordpress removes a load of "special" characters from filenames:
https://github.com/WordPress/wordpress-develop/blob/6.7.2/src/wp-include...
function sanitize_file_name( $filename ) { ... $special_chars = array( '?', '[', ']', '/', '\\', '=', '<', '>', ':', ';', ',', "'", '"', '&', '$', '#', '*', '(', ')', '|', '~', '`', '!', '{', '}', '%', '+', 'β', 'Β«', 'Β»', 'β', 'β', chr( 0 ) ); ... $filename = str_replace( $special_chars, '', $filename );
Joomla seems to have decided to let the underlying OS decide what it will and will not allow in filenames :shrug:
- Merge request !11760[#3516706] Remove special chars from filename uploads β (Open) created by kim.pepper
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Here's a basic implementation. I haven't looked at any tests yet.
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
Thanks, that's a good start. Looks like you've borrowed the list of chars to strip from wordpress. Perhaps we could remove a few of the "safe" ones based on the list in #7; we're only talking about a few e.g.
% + = :
.FWIW here's how typo3 does this:
https://github.com/TYPO3-CMS/core/blob/v13.4.9/Classes/Resource/Driver/L...
The comments suggest that it's pretty strict:
... any character not matching [.a-zA-Z0-9_-] is substituted by '_'
...but in actual fact it's a bit more nuanced as there are some settings around allowing / remapping utf8 characters.
I'll try to do some work on this but am short on time just now.
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
sorry, didn't mean to change status
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Removed config option, and updated tests now we replace special chars.
- πΊπΈUnited States smustgrave
Tests appear to be there so removing that tag.
Looking at the solution were all options taken?
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Looking at the solution were all options taken?
Do you mean are all the special characters included?
- πΊπΈUnited States smustgrave
Mean the proposed solution read like there were several approaches? Unless I read that wrong
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
I think we agreed on stripping special characters. It was more about which characters to strip. I have a feeling stripping whitespace might be contentious.
- πΊπΈUnited States smustgrave
Can whatβs not making it in be scratched from the proposed solutio
- πΊπΈUnited States cmlara
As a site owner, I'm going to miss Parentheses, Ampersand and Space., A little less Brackets, Pound Sign and Exclamation Point though I encounter them.
As a security engineer: This is not a bad hardening, however it is certainly not a fix for wherever the faults actually occur.
As someone who has used this in the past to perform sample RCE's against site setups; I can say yes this would make it much harder to exploit.
I also wish to reiterate that for these characters to cause issue someone has had to make some very serious mistakes down stream, mistakes that this change likely does not actually remove the vulnerabilities , just at most makes it significantly harder to exploit (move from Basic to Complex, saving 1 point on the Drupal Vulnerability Scoring ) and maybe move the Impacted Environment from All to Uncommon (2 points on the Drupal Score scale).
I agree with @smustgrave the 'excluded' ideas would be nice to have listed for historical purposes.
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
TBH I don't have a strong opinion on which characters should be included or excluded. I'll happily defer to those with a stronger security background. I _do_ think removing whitespace will be disruptive to site owners.