Invalid characters in filter values returns an error

Created on 3 September 2023, about 1 year ago
Updated 6 September 2023, about 1 year ago

Problem/Motivation

When executing a search with a filter (not a query but an exact filter) that contains specific characters such as ":" an error is returned from Meilisearch. The error returned is Found unexpected characters at the end of the filter: You probably forgot an `OR` or an `AND` rule.

This happens because we wrap filter values in quotes only when a space, "AND", "OR" or "TO" is detected in the filter value.

Steps to reproduce

  1. Create a meilisearch index
  2. Add a title field to the index with a "string" type and reindex all
  3. Create an index view
  4. Add a filter field for title
  5. Type "my:title" (without quotes) to the filter field in the view preview
  6. An error is returned and no filtering occurs
  7. Type "my:title" (with quotes) to the filter field in the view preview
  8. Filtering works properly

Proposed resolution

Always wrap non-numeric filter values in quotes. Even if the user types the filter value in quotes it should be quoted again as the filter should match exact value typed into the field.

Remaining tasks

  • Add a test that demonstrates the bug
  • Fix the bug

User interface changes

None.

API changes

None.

Data model changes

None.

🐛 Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

🇸🇮Slovenia bcizej

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bcizej
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    100 pass, 2 fail
  • @bcizej opened merge request.
  • Status changed to Needs work about 1 year ago
  • 🇸🇮Slovenia bcizej

    Added a test that demonstrates the bug.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    103 pass
  • Status changed to Needs review about 1 year ago
  • 🇸🇮Slovenia bcizej

    The issue was even bigger than I thought, using quotes in values would also break filtering and using certain combinations of characters in values such as he":llo\' would either break the filtering or the filter would not return expected results with single quotes condition syntax field = 'value'. In the end what worked was changing the syntax to use double quotes field = "value" and escaping any double quote " characters inside the value as well. I have added a few more tests for such conditions.

    Ready for review.

  • Status changed to Needs work about 1 year ago
  • It still breaks if the search contains a backslash (\) on the end of the line.

    Example for foo\:
    Expression `\"foo\\\"` is missing the following closing delimiter: `"`. 9:15 title = "foo\"

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    100 pass, 1 fail
  • So the solution for what I encountered was simpler than I expected it to be. In cases where the value is a backslash or ends with a backslash, we just need to add a space when parsing data. Now I don't know if this is the best solution, I've tried replacing the backslash with double backslash, but filtering sees that as a different value than single backslash.

    While this is an edge case, I can still imagine a situation where someone might choose to add a backslash at the end for stylistic reasons.

    Also, I don't understand why the tests are failing, locally they passed. Could someone please check it out?

  • 🇸🇮Slovenia DeaOm

    Tests are failing because this solution for the double backslash is not working as there is no results, tested manually, but test fails locally also. The query looks like:
    "filter":"title = \"test:entity\\\\ \""
    Which is not correct.

  • 🇸🇮Slovenia bcizej

    I have tried every combination I could think of to try and get the proper result with no avail. I think this is a meilisearch issue so I opened an issue on Github. In the worst case scenario we might have to ignore this single backslash issue for now.

  • 🇸🇮Slovenia bcizej

    So the solution for what I encountered was simpler than I expected it to be. In cases where the value is a backslash or ends with a backslash, we just need to add a space when parsing data.

    The error disappears but this is not a solution as meilisearch treats the filter as "foo\ " (with space) which is not the same as "foo\" (without space) thus returning no results.

  • @DeaOm I think that's not the problem, locally tests pass. Looking at what I've done I used str_ends_with, that function is only PHP 8, which I should not have used. Will push a quick fix so it can be tested again.

    @bcizej but from the testing I've done it works. So I made an article called "cool\" If I add a filter for the title field and search for "cool" I don't get anything, but if I search for "cool\" I do get results. I'll push the fix for PHP 7.4, the only problem I see with this is if I add an article called "cool\ " and search for "cool\" or "cool\ " I get both of them returned, but in my opinion, this is better than getting an error.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    100 pass, 1 fail
  • Ok I'll remove my code but leave the tests with a todo message pointing to the issue on github you posted.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    103 pass
  • So what should then be done? Ignore this error and merge and later on, when meilisearch fixes it on their side we also fix the problem or wait for them to fix it first?

  • Status changed to Needs review about 1 year ago
  • 🇸🇮Slovenia DeaOm

    I would leave it out, as it's not actually a blocker, but would create a new issue with postponed status and reference this issue and also the git issue, as we can't know when the git issue will actually get resolved and if it will get resolved, so setting it back to needs review, but I think it was RTBC before the backslash issue.

  • 🇦🇹Austria tgoeg

    I somehow feel this might be the wrong path to go. This solution is actually explicitly blacklisting/sanitizing/escaping what might cause errors.
    From a security PoV I always fear blacklisting solutions as (evil) people tend to find ways to evade the blacklisting, i.e. encode the characters as unicode code points/HTML entities, double encode them, etc. This really smells like SQL or code injection stuff.

    What about doing it the other way round and only allowing a known-good whitelist of characters that may be supplied?
    I am not enough into PHP to decide whether this can easily be achieved (for different human languages), but couldn't we basically just allow regex classes [:alnum:][:blank:]["-] and catch any violation of that right in Drupal before sending it off to Meilisearch (and let the users know)?
    Or just throw away anything not matching that silently (like web search engines do it, much to my dismay sometimes, but the other heart in my chest tells me it's the right thing to do).

    I don't see why anything else but a normal search phrase, "literal string" or a -negative keyword should be allowed (at the moment, that is; I don't think Meilisearch supports anything more right now, anyway)

    Meilisearch has a really solid typo tolerance, there's nothing preventing anyone to find my fancy\company by just searching without the backslash. Yes, literal searching deactivates the typo tolerance. But I am pretty sure Meilisearch will at some stage throw away special characters anyway and rely on fuzzy matches, so don't send them in the first place and be secure by default.

    Does that sound reasonable?

  • 🇸🇮Slovenia bcizej

    I somehow feel this might be the wrong path to go. This solution is actually explicitly blacklisting/sanitizing/escaping what might cause errors.
    From a security PoV I always fear blacklisting solutions as (evil) people tend to find ways to evade the blacklisting, i.e. encode the characters as unicode code points/HTML entities, double encode them, etc. This really smells like SQL or code injection stuff.

    I am not sure what that means but Meilisearch is capable of storing any character into their attributes - apart from primary key - thus should be filterable by these characters as well. The security is on the Meilisearch side to handle such cases but that shouldn't prevent us from filtering with such characters.

    What about doing it the other way round and only allowing a known-good whitelist of characters that may be supplied?
    I am not enough into PHP to decide whether this can easily be achieved (for different human languages), but couldn't we basically just allow regex classes [:alnum:][:blank:]["-] and catch any violation of that right in Drupal before sending it off to Meilisearch (and let the users know)?

    We could do that, but what if there are other apps that connect to the same index for read-only operations and the data is not 100% accurate with stripped characters. Unless you are talking about filtering only but such a filter will not produce accurate results.

    I don't see why anything else but a normal search phrase, "literal string" or a -negative keyword should be allowed (at the moment, that is; I don't think Meilisearch supports anything more right now, anyway)

    Meilisearch has a really solid typo tolerance, there's nothing preventing anyone to find my fancy\company by just searching without the backslash. Yes, literal searching deactivates the typo tolerance. But I am pretty sure Meilisearch will at some stage throw away special characters anyway and rely on fuzzy matches, so don't send them in the first place and be secure by default.

    This is fine when it comes to searching aka the "q" parameter in request body but this issue is for the "filter" parameter which does only exact matching ie. the condition title = "foo" will return documents with titles that exactly matches "foo"

    Anyway I did get a response in the Github issue and they do confirm it is a bug.

  • Status changed to Needs work about 1 year ago
  • 🇸🇮Slovenia bcizej

    I think for now the best we can do is to catch the error and return 0 results but log the error. When meilisearch does fix the bug the filtering should work for the backslash. I believe the issue is only with one \ character at the end, having 2 or more works fine and the other characters should not present any issues.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    106 pass
  • 🇸🇮Slovenia DeaOm

    Based on failing tests and also my manual testing the issue is present also with 2 backslashes, not sure if we are testing one backslash in tests.

    EntityTest::create([
        'name' => 'test:entity\\',
  • 🇦🇹Austria tgoeg

    Sorry for the noise, then.
    I'm still confused when it comes to search_api vs Meilisearch terms, i.e. queries, filters, facets, attributes, ..

  • 🇸🇮Slovenia DeaOm

    I manually tested again without the empty space correction and it works with 2 backslashes. Like @bcizej discovered in the git issue, it seems it has an issue with odd numbers of backslashes.
    The error is logged and also displayed and no results are showing up. The test for two backslashes can be added, to confirm it works for 2.

  • 🇸🇮Slovenia bcizej

    I believe the issue is only with one \ character at the end, having 2 or more works fine and the other characters should not present any issues.

    This is not entirely correct, using any odd number of backslashes will produce an error (1, 3, 5 etc. backslashes)

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    106 pass
  • Status changed to Needs review about 1 year ago
  • 🇸🇮Slovenia DeaOm

    Based on this new information regarding odd number of backslashes not working, added test for the working ones, entity title that ends with 2 backslashes and also entity that only has 2 backslash as title. I left the one backslash commented out as it was on todo, so once meili search bug is fixed, can be easily re-tested.
    The logging of the error and no results displayed is already working, so nothing to do there. Marking it as needs review, but because only tests have been adjusted, in my POV, can be merged.

    • bcizej committed 2f90e0a7 on 1.0.x
      Issue #3385017 by bcizej, admirlju, DeaOm: Invalid characters in filter...
  • Status changed to Fixed about 1 year ago
  • 🇸🇮Slovenia bcizej

    Merged, thanks.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024