Protection from Bots

Created on 14 November 2024, 5 months ago

Problem/Motivation

Is it possible to have configurable bot protection to prevent bots from sending requests directly to the search URL? I have HoneyPot and Antibot installed which provides protection to the form but doesn't prevent queries from going directly to the search URL with the key parameter. This is likely an issue with any Drupal search but since Vertex AI Searches come with a cost, bots spamming the search results URL can drive up the cost.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

โœจ Feature request
Status

Active

Version

1.3

Component

Miscellaneous

Created by

๐Ÿ‡บ๐Ÿ‡ธUnited States Christian DeLoach

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @Christian DeLoach
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States SamLerner

    I like this idea. I'm actually looking into this problem for some sites I'm managing, as search bots are causing millions of additional hits each month on pages of search results.

    My first attempt to solve this was to add a <meta name="robots" content="noindex,nofollow"/> tag on the URL for the search page. This seems like something easy to add as an option for the Drupal search page, if in fact it's a working solution.

    @christian-deloach did you have any specific configuration options in mind?

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States Christian DeLoach

    Thank you @samlerner.

    By default, Drupal core's robots.txt already includes "Disallow: /search/" which requests search engines to not index the /search/ paths. Adding the robots meta tag may be redundant unless a search engine does not use the robots.txt or the site does not have Drupal's default robots.txt file.

    But neither the robots.txt file nor the robots meta tag will prevent malicious bots from sending queries directly to the search application.

    My thought is to add an option to check if the request to the search query came from the search form by passing a token. This would obviously "break" how the Drupal search currently works in that it does not require the request to come from the form so this request is an add-on that should be disabled by default, but I suspect most sites with the Vertex AI Search module would benefit, unless the site is already protected from bots.

    As a quick fix, I was considering setting up my server to redirect any request to the /search path without the "searchPage" argument to the default search page. The "searchPage" argument appears to be added by the Vertex AI Search module to the search redirect URL. However, the "searchPage" argument is not added when submitting the search form from the Search Form Block, it's only added when submitting the search form from the search page. Of course, it's not a robust way to block bots, but all of the malicious bots hitting my search form are not going through the form itself, but rather sending the queries directly to the search URL (e.g. /search?keys=foobar).

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States SamLerner

    I see what you're saying. Another idea would be to use flood control, similar to what Acquia Search is using:

    https://git.drupalcode.org/project/acquia_search/-/blob/3.1.x/src/EventS...

    That wouldn't block bots from using the search path, but it could keep things from getting out of hand.

  • Merge request !23Adds flood control to searches. โ†’ (Merged) created by SamLerner
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States tzura

    timozura โ†’ made their first commit to this issueโ€™s fork.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States tzura

    There is an MR that needs review. It adds some flood control to the vertex search service if the flood control is enabled on a search page's configuration page.

    To configure, edit the Vertex search page...check the box to enable flood control and set the threshold, window, and message values. Perform enough searches to hit your threshold and no more vertex searches will be performed until the window closes.

  • First commit to issue fork.
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States tzura
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States tzura

    @christian-deloach we added in some flood control functionality. I'll update the docs soon, but the 1.5.0-beta5 release adds some flood control options (threshold, time window, message) to the search page configuration. It's disabled by default, but if you get a chance to test it out, let me know how it goes. Closing this issue for now though - hoping it helps ward off the bots.

Production build 0.71.5 2024