Disallow dangerous filenames e.g. command injection characters

Created on 1 April 2025, 8 days ago

Problem/Motivation

Following discussion with the Drupal Security Team, it was agreed that this could be handled in a public "security improvements" issue.

At present Drupal's file API allows filenames to be created which could be dangerous if they're not handled safely. This is not a directly exploitable vulnerability, but improvements could be made that would reduce the likelihood of filenames being used as part of a chained attack.

Command injection is a specific concern here.

https://owasp.org/www-community/attacks/Command_Injection

https://portswigger.net/web-security/os-command-injection

Steps to reproduce

In some cases, browsers will escape/encode certain characters in a normal file upload, but it may be possible to avoid that escaping using a tool like Burp Suite, or perhaps via web services (rest / jsonapi).

An example of a dangerous filename which I believe a normal file field will currently accept is:

foo";echo `whoami`; #.txt

Proposed resolution

One or more of:

* Add a transliteration option to remove specific characters that may be used for command injection e.g. " ; # |` and if possible ' &.
* Disallow spaces by default in filenames (makes it quite a lot harder to achieve meaningful command injection).
* Review defaults for filename transliteration to make command injection as hard as possible.

Remaining tasks

* Implement improvements.
* Add tests (e.g. in \Drupal\Tests\file\Functional\SaveUploadTest ).
* Ensure that improvements also apply to web services and other uses of the API if possible.

User interface changes

Default filename handling may change; most noticeable changes for users may include removal/substitution of commonly used characters such as spaces, apostrophes and ampersands.

Introduced terminology

n/a?

API changes

Changes to filename handling may represent an API change.

Data model changes

n/a?

Release notes snippet

tbc

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

file system

Created by

πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

Live updates comments and jobs are added and updated live.
  • Security improvements

    It makes Drupal less vulnerable to abuse or misuse. Note, this is the preferred tag, though the Security tag has a large body of issues tagged to it. Do NOT publicly disclose security vulnerabilities; contact the security team instead. Anyone (whether security team or not) can apply this tag to security improvements that do not directly present a vulnerability e.g. hardening an API to add filtering to reduce a common mistake in contributed modules.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @mcdruid
  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί
  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    All of these are available as optional settings. Are you saying we should default to safer settings, or enforce them?

  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

    I'd recommend that we enforce the removal (/ substitution) of the most dangerous characters.

    For removal of spaces, that'd be ok as a default (would people really want to turn it off?!)

  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

    @kim.pepper although the existing options would remove all of the characters I highlighted as a concern, they do more than that.

    I wouldn't advocate stripping filenames down to only ASCII / alpha-numeric characters (by default or as a mandatory setting) because of how restrictive that is for different languages.

    I'm proposing a fairly limited number of forbidden characters - those which have significance for the shell - which would be a subset of what would be stripped by any of the existing options IIUC?

  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

    Having said that, I went to look for a definitive list of "dangerous characters" and ended up with quite a lot of "it depends".

    This is not definitive, but e.g. https://stackoverflow.com/questions/15783701/which-characters-need-to-be...

    There are some quite good lists using e.g. the shell escaping option that printf (and apparently ls) offer - e.g.

    No, character % does not need to be escaped
    No, character + does not need to be escaped
    No, character - does not need to be escaped
    No, character . does not need to be escaped
    No, character / does not need to be escaped
    No, character : does not need to be escaped
    No, character = does not need to be escaped
    No, character @ does not need to be escaped
    No, character _ does not need to be escaped
    Yes, character   needs to be escaped
    Yes, character ! needs to be escaped
    Yes, character " needs to be escaped
    Yes, character # needs to be escaped
    Yes, character $ needs to be escaped
    Yes, character & needs to be escaped
    Yes, character ' needs to be escaped
    Yes, character ( needs to be escaped
    Yes, character ) needs to be escaped
    Yes, character * needs to be escaped
    Yes, character , needs to be escaped
    Yes, character ; needs to be escaped
    Yes, character < needs to be escaped
    Yes, character > needs to be escaped
    Yes, character ? needs to be escaped
    Yes, character [ needs to be escaped
    Yes, character \ needs to be escaped
    Yes, character ] needs to be escaped
    Yes, character ^ needs to be escaped
    Yes, character ` needs to be escaped
    Yes, character { needs to be escaped
    Yes, character | needs to be escaped
    Yes, character } needs to be escaped
    

    I feel like most of those make sense; and there are a fair number of characters that would be allowed.

    Whether this would be more or less confusing / user friendly than just stripping all punctuation though is perhaps a usability question.

    Looks like the code mostly uses regex; I wonder whether the punct character class would be any use. It'd remove the "safe" characters from the list above which we may not want to do.

    To be clear what I'm looking for here is something that would remove all the dangerous shell characters but not limit users to strict ASCII alphanum without e.g. accented letters etc..

  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

    FWIW by default Wordpress removes a load of "special" characters from filenames:

    https://github.com/WordPress/wordpress-develop/blob/6.7.2/src/wp-include...

    function sanitize_file_name( $filename ) {
    
    ...
    
    	$special_chars = array( '?', '[', ']', '/', '\\', '=', '<', '>', ':', ';', ',', "'", '"', '&', '$', '#', '*', '(', ')', '|', '~', '`', '!', '{', '}', '%', '+', '’', 'Β«', 'Β»', '”', 'β€œ', chr( 0 ) );
    
    ...
    
    	$filename = str_replace( $special_chars, '', $filename );
    

    Joomla seems to have decided to let the underlying OS decide what it will and will not allow in filenames :shrug:

    https://github.com/joomla/joomla-cms/issues/33214

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Here's a basic implementation. I haven't looked at any tests yet.

  • Pipeline finished with Failed
    2 days ago
    Total: 552s
    #467850
  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

    Thanks, that's a good start. Looks like you've borrowed the list of chars to strip from wordpress. Perhaps we could remove a few of the "safe" ones based on the list in #7; we're only talking about a few e.g. % + = :.

    FWIW here's how typo3 does this:

    https://github.com/TYPO3-CMS/core/blob/v13.4.9/Classes/Resource/Driver/L...

    The comments suggest that it's pretty strict:

    ... any character not matching [.a-zA-Z0-9_-] is substituted by '_'

    ...but in actual fact it's a bit more nuanced as there are some settings around allowing / remapping utf8 characters.

    I'll try to do some work on this but am short on time just now.

  • πŸ‡¬πŸ‡§United Kingdom mcdruid πŸ‡¬πŸ‡§πŸ‡ͺπŸ‡Ί

    sorry, didn't mean to change status

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Removed config option, and updated tests now we replace special chars.

  • Pipeline finished with Failed
    about 20 hours ago
    Total: 682s
    #468761
  • Pipeline finished with Failed
    about 19 hours ago
    Total: 528s
    #468787
  • Pipeline finished with Success
    about 19 hours ago
    Total: 929s
    #468799
Production build 0.71.5 2024