- Issue created by @Christian DeLoach
- πΊπΈUnited States SamLerner
I like this idea. I'm actually looking into this problem for some sites I'm managing, as search bots are causing millions of additional hits each month on pages of search results.
My first attempt to solve this was to add a
<meta name="robots" content="noindex,nofollow"/>
tag on the URL for the search page. This seems like something easy to add as an option for the Drupal search page, if in fact it's a working solution.@christian-deloach did you have any specific configuration options in mind?
- πΊπΈUnited States Christian DeLoach
Thank you @samlerner.
By default, Drupal core's robots.txt already includes "Disallow: /search/" which requests search engines to not index the /search/ paths. Adding the robots meta tag may be redundant unless a search engine does not use the robots.txt or the site does not have Drupal's default robots.txt file.
But neither the robots.txt file nor the robots meta tag will prevent malicious bots from sending queries directly to the search application.
My thought is to add an option to check if the request to the search query came from the search form by passing a token. This would obviously "break" how the Drupal search currently works in that it does not require the request to come from the form so this request is an add-on that should be disabled by default, but I suspect most sites with the Vertex AI Search module would benefit, unless the site is already protected from bots.
As a quick fix, I was considering setting up my server to redirect any request to the /search path without the "searchPage" argument to the default search page. The "searchPage" argument appears to be added by the Vertex AI Search module to the search redirect URL. However, the "searchPage" argument is not added when submitting the search form from the Search Form Block, it's only added when submitting the search form from the search page. Of course, it's not a robust way to block bots, but all of the malicious bots hitting my search form are not going through the form itself, but rather sending the queries directly to the search URL (e.g. /search?keys=foobar).