- Issue created by @Anybody
- π¨πSwitzerland berdir Switzerland
That's how regex works? If you want that to not match aspx, then you should end with a $, which means end of line, so only matches if the string ends with that. Same for ^ to match on start of the path.
- Issue was unassigned.
- π©πͺGermany Grevil
Ok so the question for this issue should be, whether we adjust the default config to have $'s at the end of each line with a file extension (e.g. .json, .asp, etc.).
- π©πͺGermany Anybody Porta Westfalica
Thanks @Berdir for the wakeup call - sorry I didn't think about that -.-
I agree with @Grevil that I might not be the only stupid person running into this and we should perhaps at least for the defaults consider to explicitly end the word after .asp (especially because .aspx exists), while I'm not totally sure about the .php cases.
What would you prefer?
Additionally, it might make sense to at least add a note to the description about such cases, because not every site maintainer is a developer? Should we ask someone else for opinion?
- π©π°Denmark ressa Copenhagen
In β¨ Provide migration path from Drupal 8 path2ban Active , my second most desired feature from path2ban is to not use Regex syntax in the "List of restricted paths" form :)
Perhaps we could consider creating an issue, called "Phase out Regex syntax in List of restricted paths form"? Regex is not easy, and it may hinder users from adding their own paths correctly.
- π¨πSwitzerland berdir Switzerland
> .asp / .aspx (especially because .aspx exists)
.asp exists too, that's why requests to it are made. All of the default patterns are things that exist somewhere. Maybe there should be a bigger emphasis on reviewing and adjusting that configuration in the documentation/project page. If you recently migrated from a wordpress site to drupal you will likely get valid wordpress related request as well and might end up blocking users and crawlers if you're not careful.
Yes, regex is a problem, exposing regex directly basically makes this a developer tool. But regular expressions are also flexible in ways that the default drupal path matching isn't, I'm not sure if that could cover the current rules 1:1 or those that people have set up. One option would be to have two different settings, one for regular expressions, one for standard drupal path matching. I don't think that belongs in this issue though. Another idea would be to have a test UI, where you can enter a path and it lets you know if and which rule matches that path. but belongs even less in this issue.
FWIW, directly exposing regular expressions in the UI can also be a security issue, but it's already protected with a restricted-access permission.
- π©π°Denmark ressa Copenhagen
Great suggestion, thanks @berdir. I created β¨ Allow standard Drupal path matching Active .
- πΈπͺSweden alayham
How about something like this in the admin form
coupled with something like this in the subscriber$banned_patterns = array_merge($banned_expressions, array_map(function(string $word) {return '/.*\Q'. $word . '\E.*/';}, $banned_words), array_map(function(string $word) {return '/.*\Q'. $word . '\E$/';}, $banned_extensions),
We could also hide the regex field under an
Advanced
details element. - πͺπ¨Ecuador jwilson3
#9 is a great proposal. But it needs a lot more work to flush out because you'd want to support forward slashes, hashes, and other non-ascii characters in the "$word" variable. So you'd need a more solid way to encode/escape $word for use inside a regex.
Example's you'd want to support in banned words:
- wildcards -
oneword*another
the asterisk needs escaping for regex. - slashes -
/path/to/thingy
slashes (and backslashes) need escaping for the regex. - periods - Eg, I want to block
some.xml
but not all things that have xml extension. periods need escaping for regex. - dashes -
thing-thing-thing</li> <li>brackets - <code>fq[]=thing
or
some{$malicious_code}
square and curly brackets need escaping for regex. - probably many many more
Then, there is also the complication of an upgrade path from the current Regex Based solution to a non-regex based solution. I'm not certain one exists, because you can do things like [a-zA-Z0-9-_] in a regex that you'll never be able to do with a simple word match. Therefore you'd want to leave the regex capability there, but collapse it inside a
<details>
container and mark it as "Advanced RegEx Matching"? And rewrite the default examples for the simpler case. No migration needed. New installs benefit from the simpler UI with default examples, Existing installs can manually migrate and simplify their regexes. - wildcards -
I think this issue in part stems from the default matches being misleading. Looking at the examples you'd assume the match is anchored to the start through to the end of the string. But it's using preg_match so, for example
/.*\.aspx/
is the same as/\.aspx/
.
I'd suggest one of the following approaches for clarity:- Adding starting
^
and ending$
markers to each of the default matches, where appropriate. For example:
/\.aspx$/
/^\/wp-admin/
- Wrapping each entry with
^
and$
in the code.
- Adding starting