Add optional non-crawler rate limit

Created on 27 December 2022, over 1 year ago
Updated 21 March 2023, over 1 year ago

Problem/Motivation

Occasionally our site will get crawled really fast by a visitor who does not fly the bot/crawler flag (user agent string may be spoofed to look like a regular human-visitor or otherwise does not come back as a crawler from jaybizzle/crawler-detect). In this situation that visitor is able to bypass the crawler rate limit we've set, and can crawl the site at any rate. This can slow down the site's performance for regular visitors and legitimate crawlers. To mitigate this, it would be great if we could also enforce a general, any-visitor rate limit.

Steps to reproduce

Set a user-agent string that does not look like a crawler / bot, then crawl the site as fast as you please.

Proposed resolution

Add an optional non-crawler rate limit (interval and operations settings) and apply to visitors who do not appear to be crawlers based on user-agent string. Probably identifying the visitor uniquely by user-agent string + IP address.

Maybe the logic would be:

Is crawler?
Yes -> Enforce crawler rate limit.
No & non-crawler rate limit set? -> Enforce non-crawler rate limit.

Remaining tasks

Patch + review.

User interface changes

New, optional settings for non-crawler rate limit.

API changes

Data model changes

✨ Feature request
Status

Fixed

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States chrisolof

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.69.0 2024