LIKE and EXACT filter for searching fulltext

Created on 2 May 2024, over 1 year ago

Added LIKE and EXACT elastic operator like in parent issue but for 8.0.x-dev

Feature request
Status

Active

Version

8.0

Component

Elasticsearch Connector

Created by

🇵🇱Poland artemboiko

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @artemboiko
  • First commit to issue fork.
  • Open on Drupal.org →
    Core: 10.2.1 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    Waiting for branch to pass
  • Status changed to Needs review over 1 year ago
  • Open on Drupal.org →
    Core: 10.2.1 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    Waiting for branch to pass
  • 🇨🇦Canada mparker17 UTC-4

    @artemboiko, thank you! (and apologies: I only have expertise in the 8.0.x branch of the module, so I'm unable to help move the 8.x-7.x branch forward!)

    I've created a merge request: lets see if tests pass.

    Offhand, the code looks fine thus far. May I request that you update the test at tests/src/Unit/SearchAPI/Query/FilterBuilderTest.php to test the new code? (this will ensure that future changes don't accidentally break the LIKE and EXACT functionality you need!)

    Thank you very much!

  • 🇨🇦Canada mparker17 UTC-4

    If possible, it would be nice to have a NOT LIKE filter too, as put forward in #3092486-10: LIKE filter missing . That issue (which is a lot older and uses the old, switch-based syntax) suggests the following code...

    +        case 'LIKE':
    +          $filter = [
    +            'bool' => [
    +              'should' => [
    +                'wildcard' => [$condition->getField() => $condition->getValue()],
    +              ],
    +            ]
    +          ];
    +          break;
    +
    +        case 'NOT LIKE':
    +            $filter = [
    +              'bool' => [
    +                'must_not' => [
    +                  'wildcard' => [$condition->getField() => $condition->getValue()],
    +                ],
    +              ]
    +            ];
    +            break;
    +
    

    ... I'd like to close that other issue in favor of this one, so I'm pasting it here so that the code isn't forgotten.

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @abrar_arshad because they proposed a patch in LIKE filter missing Closed: duplicate

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @artemboiko because they were not credited automatically

    Also crediting @sokru because they provided direction in LIKE filter missing Closed: duplicate

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @kevinn because they proposed a patch in LIKE filter missing Closed: duplicate

  • Assigned to mparker17
  • 🇨🇦Canada mparker17 UTC-4

    Assigning to myself to see if I can integrate the NOT LIKE filter, and add some tests.

  • 🇨🇦Canada mparker17 UTC-4

    That seemed to work!

    Although that being said, our tests (currently) only test that the queries get constructed in the way we expect, i.e.: we don't (yet) test that Elasticsearch 8 can parse them or that they return results that make sense for that kind of query.

    I noticed that both the LIKE and NOT LIKE filters in LIKE filter missing Closed: duplicate both specify wildcard queries, but the LIKE filter from this issue does not. However, I copied the NOT LIKE query construction from LIKE filter missing Closed: duplicate for now.

    We don't seem to use wildcard queries anywhere else in the 8.0.x version of this module. Our sibling project, Search API OpenSearch doesn't either (or at least, not in their 3.x branch).

    I'm not exactly certain why we aren't using wildcard queries, but I daresay for now we should try to be internally consistent, i.e.: change the NOT LIKE query I just added from wildcard to something else; but I'm not particularly familiar with raw Elasticsearch query construction, so I'm not certain if — for example — changing wildcard to term would work without manual testing (which I will try shortly).

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @karma86 for the patch in LIKE and EXACT filter for searching fulltext Active (for this module's 8.x-7.x branch)

  • 🇨🇦Canada mparker17 UTC-4

    Should probably add LIKE filter missing Closed: duplicate as a related issue too.

  • Issue was unassigned.
  • 🇨🇦Canada mparker17 UTC-4

    I grepped the code in Search API, and a number of modules providing search backends for it to see if I could find any other examples of LIKE and/or NOT LIKE filters... the only match was in search_api's search_api_db submodule (i.e.: an SQL-based search backend), which implemented LIKE and NOT LIKE by escaping any % and _ wildcard characters in the search term, then performing an SQL LIKE '%term%' query.

    That says to me that end-users shouldn't be able to enter raw wildcards, meaning that we shouldn't be creating a wildcard query in our implementation. (but maybe I misread, please correct me if I am wrong). From a security perspective, I can imagine that allowing end-users to enter wildcards directly means that malicious end-users could create wildcard-filled queries to slow down the back-end, potentially causing problems for other users. But, I don't know if Elasticsearch has some built-in guardrails to prevent that.

    It is also worth noting that all of our other filter term operators create term queries (which look for exact matches). However, @artemboiko's implementation for LIKE in #2 LIKE and EXACT filter for searching fulltext Needs review creates a match query (which fuzzy-matches the search term instead of looking for an exact match). Fuzzy-matching is not the same thing as an SQL LIKE '%term%', but I daresay that @artemboiko's implementation feels like the right solution here... it seems to me that LIKE should be more lenient than =; and NOT LIKE seems as if it should be more lenient than <>... and while we could implement LIKE and NOT LIKE exactly like search_api_db does, it seems to me like one of the reasons why a someone might want to use ElasicSearch instead of SQL is because ElasticSearch has fuzzy-matching and SQL does not.

    However, if you have a good counter-point, I'd be interested to hear it!

    Acting on the above thoughts, I have left the LIKE implementation the way that @artemboiko wrote it, and changed the NOT LIKE implementation to avoid creating a wildcard query...

    diff --git a/src/SearchAPI/Query/FilterBuilder.php b/src/SearchAPI/Query/FilterBuilder.php
    index 96438b2..ed17010 100644
    --- a/src/SearchAPI/Query/FilterBuilder.php
    +++ b/src/SearchAPI/Query/FilterBuilder.php
    @@ -190,7 +190,12 @@ class FilterBuilder {
           'NOT LIKE' => [
             'bool' => [
               'must_not' => [
    -            'wildcard' => [$condition->getField() => $condition->getValue()],
    +            'match' => [
    +              $condition->getField() => [
    +                'query' => $condition->getValue(),
    +                'fuzziness' => 'AUTO',
    +              ]
    +            ],
               ],
             ],
           ],
    

    I think this is ready for review now, but since I worked on the patch, I cannot RTBC it anymore. So I'll unassign myself, and leave this as "Needs review" for others to provide feedback.

    Thank you in advance! (and sorry for the long comment)

  • 🇫🇮Finland sokru

    Looks good to me, only thing I was wondering that we hardcode the fuzziness value here to "auto" and someone might expect to get the setting from here: https://git.drupalcode.org/project/elasticsearch_connector/-/blob/8.0.x/...

  • Status changed to Needs work 3 months ago
  • 🇨🇦Canada mparker17 UTC-4

    @sokru, good idea, I'll update the merge request!

  • 🇨🇦Canada mparker17 UTC-4

    Ready for re-review!

  • Status changed to Needs review 9 days ago
Production build 0.71.5 2024