Synonyms don't work because the analyzer isn't used

Created on 28 March 2023, about 2 years ago

Problem/Motivation

Synonyms don't seem to work. I can add the list of synonyms to the server settings but they don't take effect when running a query.

Steps to reproduce

  1. Add some synonyms using the form on the 'edit server' page
  2. Force a recreation of the index - for example, by visiting the 'edit fields' page on our search index and saving the form with no changes (see extra note below)
  3. Confirm the synonyms are shown in an analyzer in the index settings - for example, using the OpenSearch dashboard: GET /my_index_name/_settings
  4. Perform a search that references synonyms in our list
  5. The search does not seem to use the synonyms. I expect it to.

Proposed resolution

I've done a bit of poking and can see the synonym config is added by SynonymsSubscriber::onAlterSettings.

The config is added to an analyzer called querytime_synonyms. I can't find any other reference to querytime_synonyms in the code or in OpenSearch/Elasticsearch docs. It looks like a custom name. I think that means it could only be invoked by setting it as the analyzer for a field in the field mapping, and/or at query time. I can't see that the module currently does this, which means I think it's just not used?

I think a reasonable choice would be to set it as the default analyzer (and default_search too?), or have an option alongside the synonym form for whether or not to use it as the defaults. For comparison, the default config that ships with the Solr module has synonyms enabled for fulltext fields.

Alternatives could be to create a processor that can be enabled per-field, or a new field type, but I think either of those are quite a bit more complex.

---

As an aside, I think there's also a minor problem shown by steps #1/#2 in my steps to reproduce above. Changing synonyms on the 'edit server' form, and then saving the form, doesn't trigger an update or recreation of the index.

This means that, separately from the fact that the analyzer config isn't being used, the analyzer config isn't actually set on the Opensearch index until you make a different change to the index that triggers BackendClient::updateIndex (like re-saving the field configuration). I think I'd expect it to, or, if not, show a message that notes the index hasn't been updated.

πŸ› Bug report
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡¦πŸ‡ΊAustralia tallytarik

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @tallytarik
  • πŸ‡³πŸ‡ΏNew Zealand magunz

    I confirming that: Renaming "querytime_synonyms" to "default" fix the problem

    diff --git a/src/Event/SynonymsSubscriber.php b/src/Event/SynonymsSubscriber.php
    index 829bc4e..56e4713 100644
    --- a/src/Event/SynonymsSubscriber.php
    +++ b/src/Event/SynonymsSubscriber.php
    @@ -26,7 +26,7 @@ class SynonymsSubscriber implements EventSubscriberInterface {
             'lenient' => TRUE,
             'synonyms' => array_map('trim', $synonyms),
           ];
    -      $settings['analysis']['analyzer']['querytime_synonyms'] = [
    +      $settings['analysis']['analyzer']['default'] = [
             'type' => 'custom',
             'tokenizer' => 'standard',
             'filter' => ['lowercase', 'asciifolding', 'synonyms'],
    
  • πŸ‡¦πŸ‡ΊAustralia mstrelan

    The solution in #2 did not work for me, but what did work was to disable fuzziness on the server configuration. We may need to find a way to combine fuzziness with synonyms.

    The other issue I found is that the AlterSettingsEvent is dispatched when the index is saved, not when the server is saved, yet the synonyms configuration is on the server. It might make more sense to move the synonyms to the index, but failing that we should probably update all indexes for a server when the server is updated.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    kim.pepper β†’ made their first commit to this issue’s fork.

  • Pipeline finished with Failed
    8 months ago
    Total: 196s
    #313113
  • Pipeline finished with Failed
    8 months ago
    Total: 226s
    #313112
  • Pipeline finished with Failed
    8 months ago
    Total: 317s
    #313111
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    kim.pepper β†’ changed the visibility of the branch 3.x to hidden.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Yes, the AlterSettingsEvent should probably be called AlterIndexSettingsEvent.

    We don't currently have anything to specifically update the server settings.

  • πŸ‡¬πŸ‡§United Kingdom rupertj Bristol, UK

    The solution from #2 is working for me. I have to rebuild the index after changing the synonyms, but after that it works as expected.

  • Status changed to Needs review 16 days ago
  • πŸ‡ΊπŸ‡ΈUnited States damienmckenna NH, USA

    #2 as a patch.

  • πŸ‡ΊπŸ‡ΈUnited States damienmckenna NH, USA
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • Pipeline finished with Skipped
    14 days ago
    #503439
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Manually tested and confirmed adding the synonyms filter to the default analyzer fixes this issue.

    Committed to 2.x and 3.x. Thanks!

  • πŸ‡¦πŸ‡ΊAustralia acbramley

    This is a pretty major breaking change for people using querytime_synonyms in custom code, it broke our integration as we're referring to that analyzer in frontend code.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Oops! Not sure how to best manage multiple analyzers.

  • πŸ‡ΊπŸ‡ΈUnited States damienmckenna NH, USA

    It might be worth reverting this change to avoid breaking backwards compatibility.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    I created a follow-up issue to restore the orginal behavior. πŸ› Restore querytime_synonyms analyzer Active

Production build 0.71.5 2024