Potential risks using "Direct query" parse mode with views?

Created on 5 July 2024, 5 months ago
Updated 11 July 2024, 5 months ago

What are the potential risks?

I would like to be able to use Boolean Operators Supported by the Standard Query Parser, as far as I know, the only query parser as of today that will allow it is the Direct query parser. Reading through comments it looks like it could be risky to use it, but how risky is it?
I'm concerned about query sanitization, user input validation, and potential malicious queries sent to solr.

Setup

  • Solr version: 8.11.3
  • Drupal Core version: 10.2.5
  • Search API version: 8.x-1.31
  • Search API Solr version: 4.3.1
  • Configured Solr Connector: Basic Auth

Useful comments

Comments that might be useful regarding potential risks.

Potential risks

Pinpoint the potential risks here.

UTF-8 validation

  • Direct parse mode plugin does not use Unicode::validateUtf8($keys): This seems to be mitigated by the fact that views is already doing it by using SearchAPIFulltext::validateExposed
  • Injection Attacks, Denial of service (DoS), Data exposure

    • might bypass intended search filters.
    • returns all documents.
    • to list all document IDs.

    Solutions

    The best solution as stated in comment #4 💬 Potential risks using "Direct query" parse mode with views? Active .

    The workaround.

    Feature request
    Status

    Active

    Version

    4.0

    Component

    Code

    Created by

    🇪🇸Spain tuwebo

    Live updates comments and jobs are added and updated live.
    • Security

      It is used for security vulnerabilities which do not need a security advisory. For example, security issues in projects which do not have security advisory coverage, or forward-porting a change already disclosed in a security advisory. See Drupal’s security advisory policy for details. Be careful publicly disclosing security vulnerabilities! Use the “Report a security vulnerability” link in the project page’s sidebar. See how to report a security issue for details.

    Sign in to follow issues

    Comments & Activities

    • Issue created by @tuwebo
    • 🇩🇪Germany mkalkbrenner 🇩🇪

      I would not use the direct query parser for public sites, just for internal tools.

      The problem is that Search API doesn't know the concept of boolean operators.
      A good implementation would be to add such a parse mode to Search API and to handle it in Search API Solr.
      This way we would not open the entire query language to the user.

      A shortcut might be to add a "boolean operators" query parser plugin to seqrch_api_solr only.

    • 🇪🇸Spain tuwebo

      Hello @mkalkbrenner, thank you very much for taking your time and fast response.
      I will start taking a look at search_api_solr, which seems a faster approach and easier to implement, then maybe take a look at search_api which is the optimal solution.

    • 🇪🇸Spain tuwebo

      A potentially Direct parse_mode handling boolean operators and grouping could look like this:

      namespace Drupal\search_api_solr\Plugin\search_api\parse_mode;
      
      use Drupal\Component\Utility\Unicode;
      use Drupal\search_api\Plugin\search_api\parse_mode\Direct;
      
      /**
       * Represents a parse mode that handles Boolean operators and grouping.
       *
       * @SearchApiParseMode(
       *   id = "direct_boolean_operators",
       *   label = @Translation("Direct query boolean operators"),
       *   description = @Translation("A direct query allowing boolean operators and grouping. Might fail if the query contains syntax errors in regard to the specific server's query syntax."),
       * )
       */
      class DirectBooleanOperators extends Direct {
      
        /**
         * {@inheritdoc}
         */
        public function parseInput($keys) {
          // Check if input is an array.
          if (is_array($keys)) {
            // Validate each element in the array.
            foreach ($keys as $key) {
              if (!Unicode::validateUtf8($key)) {
                return '';
              }
            }
            // Convert array to string with spaces between elements.
            $keys = implode(' ', $keys);
          }
          else {
            // Validate the single string input.
            if (!Unicode::validateUtf8($keys)) {
              return '';
            }
          }
      
          // Test string
          // "Drupal 10 theming" AND (views OR "content types") NOT "user authentication" + performance~2 OR security^2 && (module || plugin) !deprecated
      
          // Boolean operators and valid symbols.
          // ['AND', 'OR', 'NOT', '&&', '||', '!', '+', '-'];
      
          // Valid group and scape chars.
          // ['(', ')', '\'];
      
          // Normalize whitespace.
          $keys = preg_replace('/\s+/u', ' ', trim($keys));
      
          // Handle Boolean operators and symbols, remove extra whitespaces.
          $keys = preg_replace('/\s(AND|OR|NOT|!|\|\||&&)\s/', ' $1 ', $keys);
      
          // Define special characters to escape.
          $escape_special_chars = ['{', '}', '[', ']', '^', '~', '*', '?', ':'];
      
          // Handle special characters outside of quotes.
          $keys = preg_replace_callback('/("[^"]+")|\S+/', function($matches) use ($escape_special_chars) {
            if (isset($matches[1])) {
              // This is a quoted phrase, don't modify anything inside
              return $matches[0];
            } else {
              // This is not a quoted phrase, escape only the specified special characters
              $term = $matches[0];
              foreach ($escape_special_chars as $char) {
                $term = str_replace($char, '\\' . $char, $term);
              }
              return $term;
            }
          }, $keys);
      
      
          // @TODO
          // Handle NegativeQueryProblems: Pure Negative Queries
          // https://cwiki.apache.org/confluence/display/SOLR/NegativeQueryProblems#NegativeQueryProblems-PureNegativeQueries
      
          return $keys;
        }
      }
      
    • 🇪🇸Spain tuwebo

      A potential Query parser that could fit is the Simple Query Parser https://solr.apache.org/guide/8_1/other-parsers.html#simple-query-parser

      Which can be added in the solrconfig_extra.xml this way or use the PostConfigFilesGenerationEvent:
      <queryParser name="simple" class="solr.SimpleQParserPlugin"/>

      I think we should NOT allow WHITESPACE operator (at least), but there is an easy way to restrict it using a list of allowed ones with the parameter q.operators

      The downside is we won't be able to handle Function Queries https://solr.apache.org/guide/8_1/function-queries.html

    • 🇩🇪Germany mkalkbrenner 🇩🇪

      Both options seem to be good.
      Using the Simple Query Parser seems to be straight forward. But we should also think about how Search API processors will work with it.
      But it is worth a try.

    Production build 0.71.5 2024