Potential risks using "Direct query" parse mode with views?

Created on 5 July 2024, 8 months ago

What are the potential risks?

I would like to be able to use Boolean Operators Supported by the Standard Query Parser, as far as I know, the only query parser as of today that will allow it is the Direct query parser. Reading through comments it looks like it could be risky to use it, but how risky is it?
I'm concerned about query sanitization, user input validation, and potential malicious queries sent to solr.

Setup

  • Solr version: 8.11.3
  • Drupal Core version: 10.2.5
  • Search API version: 8.x-1.31
  • Search API Solr version: 4.3.1
  • Configured Solr Connector: Basic Auth

Useful comments

Comments that might be useful regarding potential risks.

Potential risks

  • Direct parse mode plugin does not use Unicode::validateUtf8($keys): This seems to be mitigated by the fact that views is already doing it by using SearchAPIFulltext::validateExposed
💬 Support request
Status

Active

Version

4.3

Component

Code

Created by

🇪🇸Spain tuwebo

Live updates comments and jobs are added and updated live.
  • Security

    It is used for security vulnerabilities which do not need a security advisory. For example, security issues in projects which do not have security advisory coverage, or forward-porting a change already disclosed in a security advisory. See Drupal’s security advisory policy for details. Be careful publicly disclosing security vulnerabilities! Use the “Report a security vulnerability” link in the project page’s sidebar. See how to report a security issue for details.

Sign in to follow issues

Comments & Activities

  • Issue created by @tuwebo
  • 🇩🇪Germany mkalkbrenner 🇩🇪

    I would not use the direct query parser for public sites, just for internal tools.

    The problem is that Search API doesn't know the concept of boolean operators.
    A good implementation would be to add such a parse mode to Search API and to handle it in Search API Solr.
    This way we would not open the entire query language to the user.

    A shortcut might be to add a "boolean operators" query parser plugin to seqrch_api_solr only.

  • 🇪🇸Spain tuwebo

    Hello @mkalkbrenner, thank you very much for taking your time and fast response.
    I will start taking a look at search_api_solr, which seems a faster approach and easier to implement, then maybe take a look at search_api which is the optimal solution.

  • 🇪🇸Spain tuwebo

    A potentially Direct parse_mode handling boolean operators and grouping could look like this:

    namespace Drupal\search_api_solr\Plugin\search_api\parse_mode;
    
    use Drupal\Component\Utility\Unicode;
    use Drupal\search_api\Plugin\search_api\parse_mode\Direct;
    
    /**
     * Represents a parse mode that handles Boolean operators and grouping.
     *
     * @SearchApiParseMode(
     *   id = "direct_boolean_operators",
     *   label = @Translation("Direct query boolean operators"),
     *   description = @Translation("A direct query allowing boolean operators and grouping. Might fail if the query contains syntax errors in regard to the specific server's query syntax."),
     * )
     */
    class DirectBooleanOperators extends Direct {
    
      /**
       * {@inheritdoc}
       */
      public function parseInput($keys) {
        // Check if input is an array.
        if (is_array($keys)) {
          // Validate each element in the array.
          foreach ($keys as $key) {
            if (!Unicode::validateUtf8($key)) {
              return '';
            }
          }
          // Convert array to string with spaces between elements.
          $keys = implode(' ', $keys);
        }
        else {
          // Validate the single string input.
          if (!Unicode::validateUtf8($keys)) {
            return '';
          }
        }
    
        // Test string
        // "Drupal 10 theming" AND (views OR "content types") NOT "user authentication" + performance~2 OR security^2 && (module || plugin) !deprecated
    
        // Boolean operators and valid symbols.
        // ['AND', 'OR', 'NOT', '&&', '||', '!', '+', '-'];
    
        // Valid group and scape chars.
        // ['(', ')', '\'];
    
        // Normalize whitespace.
        $keys = preg_replace('/\s+/u', ' ', trim($keys));
    
        // Handle Boolean operators and symbols, remove extra whitespaces.
        $keys = preg_replace('/\s(AND|OR|NOT|!|\|\||&&)\s/', ' $1 ', $keys);
    
        // Define special characters to escape.
        $escape_special_chars = ['{', '}', '[', ']', '^', '~', '*', '?', ':'];
    
        // Handle special characters outside of quotes.
        $keys = preg_replace_callback('/("[^"]+")|\S+/', function($matches) use ($escape_special_chars) {
          if (isset($matches[1])) {
            // This is a quoted phrase, don't modify anything inside
            return $matches[0];
          } else {
            // This is not a quoted phrase, escape only the specified special characters
            $term = $matches[0];
            foreach ($escape_special_chars as $char) {
              $term = str_replace($char, '\\' . $char, $term);
            }
            return $term;
          }
        }, $keys);
    
    
        // @TODO
        // Handle NegativeQueryProblems: Pure Negative Queries
        // https://cwiki.apache.org/confluence/display/SOLR/NegativeQueryProblems#NegativeQueryProblems-PureNegativeQueries
    
        return $keys;
      }
    }
    
  • 🇪🇸Spain tuwebo

    A potential Query parser that could fit is the Simple Query Parser https://solr.apache.org/guide/8_1/other-parsers.html#simple-query-parser

    Which can be added in the solrconfig_extra.xml this way or use the PostConfigFilesGenerationEvent:
    <queryParser name="simple" class="solr.SimpleQParserPlugin"/>

    I think we should NOT allow WHITESPACE operator (at least), but there is an easy way to restrict it using a list of allowed ones with the parameter q.operators

    The downside is we won't be able to handle Function Queries https://solr.apache.org/guide/8_1/function-queries.html

  • 🇩🇪Germany mkalkbrenner 🇩🇪

    Both options seem to be good.
    Using the Simple Query Parser seems to be straight forward. But we should also think about how Search API processors will work with it.
    But it is worth a try.

  • 🇺🇸United States pramodganore

    Thank you, this solution was very helpful.

  • 🇺🇸United States pramodganore

    I did notice however the Solr searches are case sensitive for the operator
    Example -“and”, “AND”, “And”

    Unlike power users, general users would not be aware of the subtle differences.

    Is there an existing solution, a checkbox away already built into search ?

  • 🇪🇸Spain tuwebo

    Hello @pramodganore, thanks for taking the time to look at it.
    The code in the comment #3459227-7: Potential risks using "Direct query" parse mode with views? was just a proof of concept with very basic code (be aware of it and read carefully some implications in the issue's description). There is no further code, although I am still improving it but not sure when will be ready for posting it here.
    Also a lot of testing will need to be done and probably, as @mkalkbrenner mentioned, Search API processors will not fully work (I am thinking for example about the Highlight.
    That being said, maybe the best solution could be just customizing your search form by adding some kind of tips for the final user about how they should use it, since solr is very picky about the syntax (not only case sensitive, but also some user may forget to close the single quotes, double quotes, parenthesis...).
    I am also working in the other approach by using the "Simple Query Parser", but first try yield some unexpected results, I come with something useful I'll update this issue.

  • 🇺🇸United States pramodganore

    Also noticed when i search with “something” vs something. the highlight does not apply when searching with double quotes.
    i do understand the implications, saw the #todo. for my use case we only need the basic boolean search options. nothing advanced. Really appreciate you responding back

Production build 0.71.5 2024