- Issue created by @bcizej
- last update
about 1 year ago 100 pass, 2 fail - @bcizej opened merge request.
- Status changed to Needs work
about 1 year ago 4:04pm 3 September 2023 - last update
about 1 year ago 103 pass - Status changed to Needs review
about 1 year ago 10:49pm 3 September 2023 - 🇸🇮Slovenia bcizej
The issue was even bigger than I thought, using quotes in values would also break filtering and using certain combinations of characters in values such as
he":llo\'
would either break the filtering or the filter would not return expected results with single quotes condition syntaxfield = 'value'
. In the end what worked was changing the syntax to use double quotesfield = "value"
and escaping any double quote"
characters inside the value as well. I have added a few more tests for such conditions.Ready for review.
- Status changed to Needs work
about 1 year ago 8:29am 4 September 2023 It still breaks if the search contains a backslash (\) on the end of the line.
Example for
foo\
:
Expression `\"foo\\\"` is missing the following closing delimiter: `"`. 9:15 title = "foo\"
- last update
about 1 year ago 100 pass, 1 fail So the solution for what I encountered was simpler than I expected it to be. In cases where the value is a backslash or ends with a backslash, we just need to add a space when parsing data. Now I don't know if this is the best solution, I've tried replacing the backslash with double backslash, but filtering sees that as a different value than single backslash.
While this is an edge case, I can still imagine a situation where someone might choose to add a backslash at the end for stylistic reasons.
Also, I don't understand why the tests are failing, locally they passed. Could someone please check it out?
- 🇸🇮Slovenia DeaOm
Tests are failing because this solution for the double backslash is not working as there is no results, tested manually, but test fails locally also. The query looks like:
"filter":"title = \"test:entity\\\\ \""
Which is not correct. - 🇸🇮Slovenia bcizej
I have tried every combination I could think of to try and get the proper result with no avail. I think this is a meilisearch issue so I opened an issue on Github. In the worst case scenario we might have to ignore this single backslash issue for now.
- 🇸🇮Slovenia bcizej
So the solution for what I encountered was simpler than I expected it to be. In cases where the value is a backslash or ends with a backslash, we just need to add a space when parsing data.
The error disappears but this is not a solution as meilisearch treats the filter as "foo\ " (with space) which is not the same as "foo\" (without space) thus returning no results.
@DeaOm I think that's not the problem, locally tests pass. Looking at what I've done I used
str_ends_with
, that function is only PHP 8, which I should not have used. Will push a quick fix so it can be tested again.@bcizej but from the testing I've done it works. So I made an article called "cool\" If I add a filter for the title field and search for "cool" I don't get anything, but if I search for "cool\" I do get results. I'll push the fix for PHP 7.4, the only problem I see with this is if I add an article called "cool\ " and search for "cool\" or "cool\ " I get both of them returned, but in my opinion, this is better than getting an error.
- last update
about 1 year ago 100 pass, 1 fail Ok I'll remove my code but leave the tests with a todo message pointing to the issue on github you posted.
- last update
about 1 year ago 103 pass So what should then be done? Ignore this error and merge and later on, when meilisearch fixes it on their side we also fix the problem or wait for them to fix it first?
- Status changed to Needs review
about 1 year ago 11:31am 5 September 2023 - 🇸🇮Slovenia DeaOm
I would leave it out, as it's not actually a blocker, but would create a new issue with postponed status and reference this issue and also the git issue, as we can't know when the git issue will actually get resolved and if it will get resolved, so setting it back to needs review, but I think it was RTBC before the backslash issue.
- 🇦🇹Austria tgoeg
I somehow feel this might be the wrong path to go. This solution is actually explicitly blacklisting/sanitizing/escaping what might cause errors.
From a security PoV I always fear blacklisting solutions as (evil) people tend to find ways to evade the blacklisting, i.e. encode the characters as unicode code points/HTML entities, double encode them, etc. This really smells like SQL or code injection stuff.What about doing it the other way round and only allowing a known-good whitelist of characters that may be supplied?
I am not enough into PHP to decide whether this can easily be achieved (for different human languages), but couldn't we basically just allow regex classes[:alnum:][:blank:]["-]
and catch any violation of that right in Drupal before sending it off to Meilisearch (and let the users know)?
Or just throw away anything not matching that silently (like web search engines do it, much to my dismay sometimes, but the other heart in my chest tells me it's the right thing to do).I don't see why anything else but a normal
search phrase
,"literal string"
or a-negative
keyword should be allowed (at the moment, that is; I don't think Meilisearch supports anything more right now, anyway)Meilisearch has a really solid typo tolerance, there's nothing preventing anyone to find
my fancy\company
by just searching without the backslash. Yes, literal searching deactivates the typo tolerance. But I am pretty sure Meilisearch will at some stage throw away special characters anyway and rely on fuzzy matches, so don't send them in the first place and be secure by default.Does that sound reasonable?
- 🇸🇮Slovenia bcizej
I somehow feel this might be the wrong path to go. This solution is actually explicitly blacklisting/sanitizing/escaping what might cause errors.
From a security PoV I always fear blacklisting solutions as (evil) people tend to find ways to evade the blacklisting, i.e. encode the characters as unicode code points/HTML entities, double encode them, etc. This really smells like SQL or code injection stuff.I am not sure what that means but Meilisearch is capable of storing any character into their attributes - apart from primary key - thus should be filterable by these characters as well. The security is on the Meilisearch side to handle such cases but that shouldn't prevent us from filtering with such characters.
What about doing it the other way round and only allowing a known-good whitelist of characters that may be supplied?
I am not enough into PHP to decide whether this can easily be achieved (for different human languages), but couldn't we basically just allow regex classes [:alnum:][:blank:]["-] and catch any violation of that right in Drupal before sending it off to Meilisearch (and let the users know)?We could do that, but what if there are other apps that connect to the same index for read-only operations and the data is not 100% accurate with stripped characters. Unless you are talking about filtering only but such a filter will not produce accurate results.
I don't see why anything else but a normal search phrase, "literal string" or a -negative keyword should be allowed (at the moment, that is; I don't think Meilisearch supports anything more right now, anyway)
Meilisearch has a really solid typo tolerance, there's nothing preventing anyone to find my fancy\company by just searching without the backslash. Yes, literal searching deactivates the typo tolerance. But I am pretty sure Meilisearch will at some stage throw away special characters anyway and rely on fuzzy matches, so don't send them in the first place and be secure by default.
This is fine when it comes to searching aka the "q" parameter in request body but this issue is for the "filter" parameter which does only exact matching ie. the condition
title = "foo"
will return documents with titles that exactly matches "foo"Anyway I did get a response in the Github issue and they do confirm it is a bug.
- Status changed to Needs work
about 1 year ago 5:45pm 5 September 2023 - 🇸🇮Slovenia bcizej
I think for now the best we can do is to catch the error and return 0 results but log the error. When meilisearch does fix the bug the filtering should work for the backslash. I believe the issue is only with one
\
character at the end, having 2 or more works fine and the other characters should not present any issues. - last update
about 1 year ago 106 pass - 🇸🇮Slovenia DeaOm
Based on failing tests and also my manual testing the issue is present also with 2 backslashes, not sure if we are testing one backslash in tests.
EntityTest::create([ 'name' => 'test:entity\\',
- 🇦🇹Austria tgoeg
Sorry for the noise, then.
I'm still confused when it comes to search_api vs Meilisearch terms, i.e. queries, filters, facets, attributes, .. - 🇸🇮Slovenia DeaOm
I manually tested again without the empty space correction and it works with 2 backslashes. Like @bcizej discovered in the git issue, it seems it has an issue with odd numbers of backslashes.
The error is logged and also displayed and no results are showing up. The test for two backslashes can be added, to confirm it works for 2. - 🇸🇮Slovenia bcizej
I believe the issue is only with one \ character at the end, having 2 or more works fine and the other characters should not present any issues.
This is not entirely correct, using any odd number of backslashes will produce an error (1, 3, 5 etc. backslashes)
- last update
about 1 year ago 106 pass - Status changed to Needs review
about 1 year ago 10:24am 6 September 2023 - 🇸🇮Slovenia DeaOm
Based on this new information regarding odd number of backslashes not working, added test for the working ones, entity title that ends with 2 backslashes and also entity that only has 2 backslash as title. I left the one backslash commented out as it was on todo, so once meili search bug is fixed, can be easily re-tested.
The logging of the error and no results displayed is already working, so nothing to do there. Marking it as needs review, but because only tests have been adjusted, in my POV, can be merged. - Status changed to Fixed
about 1 year ago 2:32pm 6 September 2023 Automatically closed - issue fixed for 2 weeks with no activity.