Incorrect pattern matching?

Created on 23 January 2024, 11 months ago

Problem/Motivation

Banning patterns:

/.*\.asp/
/.*\.jsp/
/\/blog_edit\.php/
/\/blogs\.php/
/\/wp-admin.*/
/\/wp-login.*/
/\/my_blogs/
/\/system\/.*\.php/
/.*systopice.*/
/.*login.json/
/\/episerver.*/

Called URL:
/owa/auth/logon.aspx

results in:

Banned: XXX.XXX,XXX,XXX for requesting /owa/auth/logon.aspx
Source: User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64)...

I'd expect that /.*\.asp/ should *not* block .aspx. That .aspx is part of the default config, but I already removed it weeks before, because the client was blocked.

By removing that entry, I expected that .aspx won't be blocked any more, but it happened!

Can we perhaps add a test for that easily? My guess is that
a) /.*\.asp/ is treated like *.asp*
or
b) The settings from the settings are not used.

@Grevil: I can show you the example project

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Active

Version

3.0

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Anybody
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    That's how regex works? If you want that to not match aspx, then you should end with a $, which means end of line, so only matches if the string ends with that. Same for ^ to match on start of the path.

  • Issue was unassigned.
  • πŸ‡©πŸ‡ͺGermany Grevil

    Ok so the question for this issue should be, whether we adjust the default config to have $'s at the end of each line with a file extension (e.g. .json, .asp, etc.).

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Thanks @Berdir for the wakeup call - sorry I didn't think about that -.-

    I agree with @Grevil that I might not be the only stupid person running into this and we should perhaps at least for the defaults consider to explicitly end the word after .asp (especially because .aspx exists), while I'm not totally sure about the .php cases.

    What would you prefer?

    Additionally, it might make sense to at least add a note to the description about such cases, because not every site maintainer is a developer? Should we ask someone else for opinion?

  • πŸ‡©πŸ‡°Denmark ressa Copenhagen

    In ✨ Provide migration path from Drupal 8 path2ban Active , my second most desired feature from path2ban is to not use Regex syntax in the "List of restricted paths" form :)

    Perhaps we could consider creating an issue, called "Phase out Regex syntax in List of restricted paths form"? Regex is not easy, and it may hinder users from adding their own paths correctly.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    > .asp / .aspx (especially because .aspx exists)

    .asp exists too, that's why requests to it are made. All of the default patterns are things that exist somewhere. Maybe there should be a bigger emphasis on reviewing and adjusting that configuration in the documentation/project page. If you recently migrated from a wordpress site to drupal you will likely get valid wordpress related request as well and might end up blocking users and crawlers if you're not careful.

    Yes, regex is a problem, exposing regex directly basically makes this a developer tool. But regular expressions are also flexible in ways that the default drupal path matching isn't, I'm not sure if that could cover the current rules 1:1 or those that people have set up. One option would be to have two different settings, one for regular expressions, one for standard drupal path matching. I don't think that belongs in this issue though. Another idea would be to have a test UI, where you can enter a path and it lets you know if and which rule matches that path. but belongs even less in this issue.

    FWIW, directly exposing regular expressions in the UI can also be a security issue, but it's already protected with a restricted-access permission.

  • πŸ‡©πŸ‡°Denmark ressa Copenhagen

    Great suggestion, thanks @berdir. I created ✨ Allow standard Drupal path matching Active .

  • πŸ‡ΈπŸ‡ͺSweden alayham

    How about something like this in the admin form

    coupled with something like this in the subscriber

        $banned_patterns = array_merge($banned_expressions, 
                            array_map(function(string $word) {return '/.*\Q'. $word . '\E.*/';}, $banned_words), 
                            array_map(function(string $word) {return '/.*\Q'. $word . '\E$/';}, $banned_extensions),
    
    

    We could also hide the regex field under an Advanced details element.

  • πŸ‡ͺπŸ‡¨Ecuador jwilson3

    #9 is a great proposal. But it needs a lot more work to flush out because you'd want to support forward slashes, hashes, and other non-ascii characters in the "$word" variable. So you'd need a more solid way to encode/escape $word for use inside a regex.

    Example's you'd want to support in banned words:

    • wildcards - oneword*another the asterisk needs escaping for regex.
    • slashes - /path/to/thingy slashes (and backslashes) need escaping for the regex.
    • periods - Eg, I want to block some.xml but not all things that have xml extension. periods need escaping for regex.
    • dashes -
      thing-thing-thing</li>
      <li>brackets - <code>fq[]=thing

      or some{$malicious_code} square and curly brackets need escaping for regex.

    • probably many many more

    Then, there is also the complication of an upgrade path from the current Regex Based solution to a non-regex based solution. I'm not certain one exists, because you can do things like [a-zA-Z0-9-_] in a regex that you'll never be able to do with a simple word match. Therefore you'd want to leave the regex capability there, but collapse it inside a <details> container and mark it as "Advanced RegEx Matching"? And rewrite the default examples for the simpler case. No migration needed. New installs benefit from the simpler UI with default examples, Existing installs can manually migrate and simplify their regexes.

  • I think this issue in part stems from the default matches being misleading. Looking at the examples you'd assume the match is anchored to the start through to the end of the string. But it's using preg_match so, for example /.*\.aspx/ is the same as /\.aspx/.
    I'd suggest one of the following approaches for clarity:

    1. Adding starting ^ and ending $ markers to each of the default matches, where appropriate. For example:
      /\.aspx$/
      /^\/wp-admin/
    2. Wrapping each entry with ^ and $ in the code.
Production build 0.71.5 2024