Not always effective, unfortunately

Created on 13 February 2025, 29 days ago

This module seemed to work for me as I could improve the situation in the past, but it seems that crawlers get more difficult to target.

I see floods of requests from different IPs, with user agents having a random string in it, for very similar existing pages (facet search) - and this is bringing my server down.

It seems that this module does not limit these requests, even when I set regular_traffic and regular_traffic_asn to very low limits (interval: 600, requests: 2).

I know it works in principle, because I lock myself out with such setting. But obviously, the crawlers use different IP and different user agent for every request.

Is there a log which shows if any limits by this module have been triggered?
Is there an option to further limit requests, e.g. disregard the User-Agent and only block by IP?

πŸ’¬ Support request
Status

Active

Version

3.0

Component

Miscellaneous

Created by

πŸ‡¦πŸ‡ΉAustria alexh

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @alexh
  • πŸ‡ΊπŸ‡ΈUnited States bburg Washington D.C.

    I agree with OP, this module feels like "throwing spaghetti at the problem". I use it because it's there, but I have no metrics on how many requests it's blocked, and I'm not able to adjust the settings on the fly as everything is hard-coded in settings.php. It would be great to see these things.

    And yes, I'm seeing a trend in my own sites of bots getting caught up in endless combinations of facet links. The solution for that for me was to block requests that contained a certain number of facet query parameters. e.g. a faceted search URL might look like this:

    /search?f[0]=filter0&f[1]=filter1&f[2]=filter2&f[3]=filter3

    If you block the "f[3]" via your WAF rules like in Cloudflare, you can stop a large amount of this traffic, but allow normal human traffic to use some of the faceted search feature. I'm working on a list of mitigation approaches around this problem. But this was the one that addressed the last issue I was having around this.

    Other things I'm using as well are the Perimeter module to block probing for vulnerable URL paths. Not that I'm worried about these attempts finding a vulnerability, but I was seeing a lot of traffic like this, which also serves un-cached pages. Also, Fast 404 (to make rendering 404 pages less resource intensive), and Antibot, and Honeypot to block spam form submissions.

Production build 0.71.5 2024