Protection from Bots

Created on 14 November 2024, about 1 month ago

Problem/Motivation

Is it possible to have configurable bot protection to prevent bots from sending requests directly to the search URL? I have HoneyPot and Antibot installed which provides protection to the form but doesn't prevent queries from going directly to the search URL with the key parameter. This is likely an issue with any Drupal search but since Vertex AI Searches come with a cost, bots spamming the search results URL can drive up the cost.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

✨ Feature request
Status

Active

Version

1.3

Component

Miscellaneous

Created by

πŸ‡ΊπŸ‡ΈUnited States Christian DeLoach

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Christian DeLoach
  • πŸ‡ΊπŸ‡ΈUnited States SamLerner

    I like this idea. I'm actually looking into this problem for some sites I'm managing, as search bots are causing millions of additional hits each month on pages of search results.

    My first attempt to solve this was to add a <meta name="robots" content="noindex,nofollow"/> tag on the URL for the search page. This seems like something easy to add as an option for the Drupal search page, if in fact it's a working solution.

    @christian-deloach did you have any specific configuration options in mind?

  • πŸ‡ΊπŸ‡ΈUnited States Christian DeLoach

    Thank you @samlerner.

    By default, Drupal core's robots.txt already includes "Disallow: /search/" which requests search engines to not index the /search/ paths. Adding the robots meta tag may be redundant unless a search engine does not use the robots.txt or the site does not have Drupal's default robots.txt file.

    But neither the robots.txt file nor the robots meta tag will prevent malicious bots from sending queries directly to the search application.

    My thought is to add an option to check if the request to the search query came from the search form by passing a token. This would obviously "break" how the Drupal search currently works in that it does not require the request to come from the form so this request is an add-on that should be disabled by default, but I suspect most sites with the Vertex AI Search module would benefit, unless the site is already protected from bots.

    As a quick fix, I was considering setting up my server to redirect any request to the /search path without the "searchPage" argument to the default search page. The "searchPage" argument appears to be added by the Vertex AI Search module to the search redirect URL. However, the "searchPage" argument is not added when submitting the search form from the Search Form Block, it's only added when submitting the search form from the search page. Of course, it's not a robust way to block bots, but all of the malicious bots hitting my search form are not going through the form itself, but rather sending the queries directly to the search URL (e.g. /search?keys=foobar).

Production build 0.71.5 2024