- Issue created by @mcdruid
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
All of these are available as optional settings. Are you saying we should default to safer settings, or enforce them?
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
I'd recommend that we enforce the removal (/ substitution) of the most dangerous characters.
For removal of spaces, that'd be ok as a default (would people really want to turn it off?!)
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
@kim.pepper although the existing options would remove all of the characters I highlighted as a concern, they do more than that.
I wouldn't advocate stripping filenames down to only ASCII / alpha-numeric characters (by default or as a mandatory setting) because of how restrictive that is for different languages.
I'm proposing a fairly limited number of forbidden characters - those which have significance for the shell - which would be a subset of what would be stripped by any of the existing options IIUC?
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
Having said that, I went to look for a definitive list of "dangerous characters" and ended up with quite a lot of "it depends".
This is not definitive, but e.g. https://stackoverflow.com/questions/15783701/which-characters-need-to-be...
There are some quite good lists using e.g. the shell escaping option that printf (and apparently ls) offer - e.g.
No, character % does not need to be escaped No, character + does not need to be escaped No, character - does not need to be escaped No, character . does not need to be escaped No, character / does not need to be escaped No, character : does not need to be escaped No, character = does not need to be escaped No, character @ does not need to be escaped No, character _ does not need to be escaped Yes, character needs to be escaped Yes, character ! needs to be escaped Yes, character " needs to be escaped Yes, character # needs to be escaped Yes, character $ needs to be escaped Yes, character & needs to be escaped Yes, character ' needs to be escaped Yes, character ( needs to be escaped Yes, character ) needs to be escaped Yes, character * needs to be escaped Yes, character , needs to be escaped Yes, character ; needs to be escaped Yes, character < needs to be escaped Yes, character > needs to be escaped Yes, character ? needs to be escaped Yes, character [ needs to be escaped Yes, character \ needs to be escaped Yes, character ] needs to be escaped Yes, character ^ needs to be escaped Yes, character ` needs to be escaped Yes, character { needs to be escaped Yes, character | needs to be escaped Yes, character } needs to be escaped
I feel like most of those make sense; and there are a fair number of characters that would be allowed.
Whether this would be more or less confusing / user friendly than just stripping all punctuation though is perhaps a usability question.
Looks like the code mostly uses regex; I wonder whether the
punct
character class would be any use. It'd remove the "safe" characters from the list above which we may not want to do.To be clear what I'm looking for here is something that would remove all the dangerous shell characters but not limit users to strict ASCII alphanum without e.g. accented letters etc..
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
FWIW by default Wordpress removes a load of "special" characters from filenames:
https://github.com/WordPress/wordpress-develop/blob/6.7.2/src/wp-include...
function sanitize_file_name( $filename ) { ... $special_chars = array( '?', '[', ']', '/', '\\', '=', '<', '>', ':', ';', ',', "'", '"', '&', '$', '#', '*', '(', ')', '|', '~', '`', '!', '{', '}', '%', '+', 'β', 'Β«', 'Β»', 'β', 'β', chr( 0 ) ); ... $filename = str_replace( $special_chars, '', $filename );
Joomla seems to have decided to let the underlying OS decide what it will and will not allow in filenames :shrug:
- Merge request !11760[#3516706] Remove special chars from filename uploads β (Open) created by kim.pepper
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Here's a basic implementation. I haven't looked at any tests yet.
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
Thanks, that's a good start. Looks like you've borrowed the list of chars to strip from wordpress. Perhaps we could remove a few of the "safe" ones based on the list in #7; we're only talking about a few e.g.
% + = :
.FWIW here's how typo3 does this:
https://github.com/TYPO3-CMS/core/blob/v13.4.9/Classes/Resource/Driver/L...
The comments suggest that it's pretty strict:
... any character not matching [.a-zA-Z0-9_-] is substituted by '_'
...but in actual fact it's a bit more nuanced as there are some settings around allowing / remapping utf8 characters.
I'll try to do some work on this but am short on time just now.
- π¬π§United Kingdom mcdruid π¬π§πͺπΊ
sorry, didn't mean to change status
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Removed config option, and updated tests now we replace special chars.