LIKE and EXACT filter for searching fulltext

Created on 2 May 2024, 7 months ago
Updated 30 August 2024, 3 months ago

Added LIKE and EXACT elastic operator like in parent issue but for 8.0.x-dev

If possible, it would be nice to have a NOT LIKE filter too

✨ Feature request
Status

Needs review

Version

8.0

Component

Elasticsearch Connector

Created by

🇺🇦Ukraine artemboiko Lviv

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @artemboiko
  • 🇺🇦Ukraine artemboiko Lviv
  • First commit to issue fork.
  • Open on Drupal.org →
    Core: 10.2.1 + Environment: PHP 8.1 & MySQL 5.7
    last update 7 months ago
    Waiting for branch to pass
  • Status changed to Needs review 7 months ago
  • Open on Drupal.org →
    Core: 10.2.1 + Environment: PHP 8.1 & MySQL 5.7
    last update 7 months ago
    Waiting for branch to pass
  • 🇨🇦Canada mparker17 UTC-4

    @artemboiko, thank you! (and apologies: I only have expertise in the 8.0.x branch of the module, so I'm unable to help move the 8.x-7.x branch forward!)

    I've created a merge request: lets see if tests pass.

    Offhand, the code looks fine thus far. May I request that you update the test at tests/src/Unit/SearchAPI/Query/FilterBuilderTest.php to test the new code? (this will ensure that future changes don't accidentally break the LIKE and EXACT functionality you need!)

    Thank you very much!

  • 🇨🇦Canada mparker17 UTC-4

    If possible, it would be nice to have a NOT LIKE filter too, as put forward in #3092486-10: LIKE filter missing → . That issue (which is a lot older and uses the old, switch-based syntax) suggests the following code...

    +        case 'LIKE':
    +          $filter = [
    +            'bool' => [
    +              'should' => [
    +                'wildcard' => [$condition->getField() => $condition->getValue()],
    +              ],
    +            ]
    +          ];
    +          break;
    +
    +        case 'NOT LIKE':
    +            $filter = [
    +              'bool' => [
    +                'must_not' => [
    +                  'wildcard' => [$condition->getField() => $condition->getValue()],
    +                ],
    +              ]
    +            ];
    +            break;
    +
    

    ... I'd like to close that other issue in favor of this one, so I'm pasting it here so that the code isn't forgotten.

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @abrar_arshad because they proposed a patch in ✨ LIKE filter missing Closed: duplicate

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @artemboiko because they were not credited automatically

    Also crediting @sokru because they provided direction in ✨ LIKE filter missing Closed: duplicate

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @kevinn because they proposed a patch in ✨ LIKE filter missing Closed: duplicate

  • Assigned to mparker17
  • 🇨🇦Canada mparker17 UTC-4

    Assigning to myself to see if I can integrate the NOT LIKE filter, and add some tests.

  • 🇨🇦Canada mparker17 UTC-4

    That seemed to work!

    Although that being said, our tests (currently) only test that the queries get constructed in the way we expect, i.e.: we don't (yet) test that Elasticsearch 8 can parse them or that they return results that make sense for that kind of query.

    I noticed that both the LIKE and NOT LIKE filters in ✨ LIKE filter missing Closed: duplicate both specify wildcard queries, but the LIKE filter from this issue does not. However, I copied the NOT LIKE query construction from ✨ LIKE filter missing Closed: duplicate for now.

    We don't seem to use wildcard queries anywhere else in the 8.0.x version of this module. Our sibling project, Search API OpenSearch → doesn't either (or at least, not in their 3.x branch).

    I'm not exactly certain why we aren't using wildcard queries, but I daresay for now we should try to be internally consistent, i.e.: change the NOT LIKE query I just added from wildcard to something else; but I'm not particularly familiar with raw Elasticsearch query construction, so I'm not certain if — for example — changing wildcard to term would work without manual testing (which I will try shortly).

  • 🇨🇦Canada mparker17 UTC-4

    Crediting @karma86 for the patch in ✨ LIKE and EXACT filter for searching fulltext Active (for this module's 8.x-7.x branch)

  • 🇨🇦Canada mparker17 UTC-4

    Should probably add ✨ LIKE filter missing Closed: duplicate as a related issue too.

  • Issue was unassigned.
  • 🇨🇦Canada mparker17 UTC-4

    I grepped the code in Search API, and a number of modules providing search backends for it → to see if I could find any other examples of LIKE and/or NOT LIKE filters... the only match was in search_api's search_api_db submodule (i.e.: an SQL-based search backend), which implemented LIKE and NOT LIKE by escaping any % and _ wildcard characters in the search term, then performing an SQL LIKE '%term%' query.

    That says to me that end-users shouldn't be able to enter raw wildcards, meaning that we shouldn't be creating a wildcard query in our implementation. (but maybe I misread, please correct me if I am wrong). From a security perspective, I can imagine that allowing end-users to enter wildcards directly means that malicious end-users could create wildcard-filled queries to slow down the back-end, potentially causing problems for other users. But, I don't know if Elasticsearch has some built-in guardrails to prevent that.

    It is also worth noting that all of our other filter term operators create term queries (which look for exact matches). However, @artemboiko's implementation for LIKE in #2 ✨ LIKE and EXACT filter for searching fulltext Needs review creates a match query (which fuzzy-matches the search term instead of looking for an exact match). Fuzzy-matching is not the same thing as an SQL LIKE '%term%', but I daresay that @artemboiko's implementation feels like the right solution here... it seems to me that LIKE should be more lenient than =; and NOT LIKE seems as if it should be more lenient than <>... and while we could implement LIKE and NOT LIKE exactly like search_api_db does, it seems to me like one of the reasons why a someone might want to use ElasicSearch instead of SQL is because ElasticSearch has fuzzy-matching and SQL does not.

    However, if you have a good counter-point, I'd be interested to hear it!

    Acting on the above thoughts, I have left the LIKE implementation the way that @artemboiko wrote it, and changed the NOT LIKE implementation to avoid creating a wildcard query...

    diff --git a/src/SearchAPI/Query/FilterBuilder.php b/src/SearchAPI/Query/FilterBuilder.php
    index 96438b2..ed17010 100644
    --- a/src/SearchAPI/Query/FilterBuilder.php
    +++ b/src/SearchAPI/Query/FilterBuilder.php
    @@ -190,7 +190,12 @@ class FilterBuilder {
           'NOT LIKE' => [
             'bool' => [
               'must_not' => [
    -            'wildcard' => [$condition->getField() => $condition->getValue()],
    +            'match' => [
    +              $condition->getField() => [
    +                'query' => $condition->getValue(),
    +                'fuzziness' => 'AUTO',
    +              ]
    +            ],
               ],
             ],
           ],
    

    I think this is ready for review now, but since I worked on the patch, I cannot RTBC it anymore. So I'll unassign myself, and leave this as "Needs review" for others to provide feedback.

    Thank you in advance! (and sorry for the long comment)

  • 🇫🇮Finland sokru

    Looks good to me, only thing I was wondering that we hardcode the fuzziness value here to "auto" and someone might expect to get the setting from here: https://git.drupalcode.org/project/elasticsearch_connector/-/blob/8.0.x/...

Production build 0.71.5 2024