Block User Agents

Created on 2 June 2025, 3 days ago

Problem/Motivation

Would be nice to have a list of user agent substrings to block. I just saw a lot off requests from one including "HTTrack", which seems to be a "website copier" tool. It's generating a lot of requests.

✨ Feature request
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States bburg Washington D.C.

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bburg
  • πŸ‡©πŸ‡°Denmark ressa Copenhagen

    That would be an interesting feature, but since HTTrack is a scraper, if the feature was added, this project could almost consider expanding its scope and name to https://www.drupal.org/project/bot_blocker β†’ ? Scrapers can cause a lot of extra traffic, which might be a strain, even for web sites without facets.

  • πŸ‡ΊπŸ‡ΈUnited States bburg Washington D.C.

    I do like that idea of using a more general namespace for the module. I do think it's important to keep a separation of concerns. Will keep this issue active until I, or someone else creates "bot_blocker"

  • πŸ‡©πŸ‡°Denmark ressa Copenhagen

    Sounds great, and thanks for all your work with facets and agents already here.

    About blocking scrapers, one method could be a rule about number of hits over a certain period (maybe five minutes?) and being able to block an IP if a threshold of requested URL's is exceeded. The reason I thought about a more generalized "hits per time period"-rule is because I have a web site where five or six facets by human is to be expected. But an intense pounding by a bot is problematic mostly due to the rapid requests, not the number of facets.

Production build 0.71.5 2024