Support partial matches

Created on 5 December 2024, 6 months ago

Problem/Motivation

I'm trying to build an autocomplete functionality powered by opensearch.
It would be useful to be able to configure an index to match words that contain or begin with the search string.

Steps to reproduce

N/A

Proposed resolution

Add an value to the index options to specify a "match mode"; either partial, starts with or no partial matches.
We check this option when the search param is being built and we add a wildcard * before and or after the search string.

Remaining tasks

Implementation.

User interface changes

A new fieldset/option is added when editing a search index for "Opensearch options". Assumes no partial matches by default.

API changes

N/A

Data model changes

N/A

Feature request
Status

Active

Version

2.0

Component

Code

Created by

🇬🇧United Kingdom khaled.zaidan

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @khaled.zaidan
  • 🇬🇧United Kingdom khaled.zaidan

    Attached is a patch, against 2.x with the solution described above.

  • 🇬🇧United Kingdom khaled.zaidan

    Oops, faulty patch in #2.

    Here's a fresh one.

  • 🇬🇧United Kingdom khaled.zaidan

    Third time is the charm!

  • 🇫🇷France striknin Lyon

    Hello,
    Thanks for this job, its usefull ... but :

    The patch applies well but it does not work with "Contains" option.

    In search_api_opensearch/src/SearchAPI/Query/SearchParamBuilder.php, it should be :

          $params['query_string']['query'] = match($matching_mode) {
            'contains' => '*' . (string) $luceneQuery . '*',
            'starts_with' => (string) $luceneQuery . '*',
            default => (string) $luceneQuery,
          };
    

    Instead of :

          $params['query_string']['query'] = match($matching_mode) {
            'partial' => '*' . (string) $luceneQuery . '*',
            'starts_with' => (string) $luceneQuery . '*',
            default => (string) $luceneQuery,
          };
    

    Or change the option value in .module file in select list "$form['options']['opensearch']['partial_match_mode']"

  • 🇨🇭Switzerland yobottehg Basel

    I tested this locally with '*' . $luceneQuery . '*', for the searching. It works but behaves sometimes very strange when it comes to the results out of the search index.

    When the query is longer then 10 characters a lot of entries are in the results.
    I debugged that with explain and saw that all the results had a `field_exist` constant score. This is only happening with the above changes and not without. Currently unsure why this happens and how this could be avioded.

    I'm also unsure how this influences the Fuzzy search logic which is set to AUTO. Perhaps this needs to be disabled for a real "contains" to work?

  • 🇯🇵Japan zealialize Japan 🇯🇵

    I was able to confirm that it works with khaled.zaidan's #5 patch.
    However, I think it would be better if the notation in the select box and the value matched.

  • Status changed to Needs review about 1 month ago
  • 🇬🇧United Kingdom jhan1112 Birmingham

    In our case, the issue was caused by unnecessary wrapping double quotes around the type-casted $luceneQuery on line 66 of /src/SearchAPI/Query/SearchParamBuilder.php.

    When there’s only one keyword, the query returns "keyword1" and behaves as expected. However, when multiple keywords are present, it becomes ""keyword1 keyword2"".
    The patch in comment #1 and #8 introduces wildcards, but applies them incorrectly—producing ""keyword1 keyword2"*" in the Single phrase and Direct query parse mode, and "(keyword1 OPERATOR keyword2)*" in the Multiple word mode.

    This patch addresses the issue by trimming the unnecessary wrapping quotes and correctly placing the wildcards:

    • Producing "keyword1 keyword2*" for the single phrase mode.
    • Producing "(keyword1* OPERATOR keyword2*)" for the multiple word mode.
  • 🇦🇺Australia kim.pepper 🏄‍♂️🇦🇺Sydney, Australia

    Changes should be against 3.x and need to be in a MR to trigger the CI pipeline.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹
  • 🇬🇧United Kingdom jhan1112 Birmingham

    This patch was created against 2.x-dev of the module.

    It includes some fixes to patch #9.

    Lucene does not support wildcards within phrase queries. Therefore, all phrase queries are now passed as quoted queries.

    For multi-word parse mode, this patch places wildcards in more appropriate positions. For example, for the keyword "global econo", it generates the query (global* OPERATOR econo*) for “starts-with”, and (*global* OPERATOR *econo*) for “contains” matching.

Production build 0.71.5 2024