Support for Configuring Synonyms

Created on 24 May 2024, 8 months ago

Problem/Motivation

We have a number for synonyms we need to create for our search. (i.e. ldpe for low density polyethylene and visa versa.) So far we have not been able to make things work as we would expect or found any specific documentation. At this point we are considering this to be a support request, however we expect it might convert to a documentation task. We would be glad to contribute back with our findings on this as we discover our solution.

It's also possible it might have something to do with this issue https://www.drupal.org/project/search_api_pantheon/issues/3424724#commen... πŸ› Search API Schema reverts back to 4.2 after deploys or randomly Active

Our configuration includes use of Solr Schema on Pantheon (all environments) is 4.2.10 with search_api_pantheon 8.1.8, search_api 8.x-1.34, and search_api_solr 4.3.3

Our configuration includes the following.

  • Our new content is being indexed on creation.
  • We have edited the English Text Field listed on SolrFieldType Configuration (is this the right place?) to include the following lines:
  •     ldpe, "low density polyethylene"
        "low density polyethylene", ldpe
        lowdensitypolyethylene, ldpe
        ldpe, lowdensitypolyethylene
         
  • Our changes are saved in synonyms_en.txt and included with the config.zip
  • Search does not find any terms using "lpde" and the field analysis only displays "ldpe" in the query analysis, even though the other values exist

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ’¬ Support request
Status

Active

Version

8.1

Component

Documentation

Created by

πŸ‡ΊπŸ‡ΈUnited States bsnodgrass

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bsnodgrass
  • πŸ‡ΊπŸ‡ΈUnited States bsnodgrass
  • πŸ‡ΊπŸ‡ΈUnited States bsnodgrass
  • πŸ‡ΊπŸ‡ΈUnited States bsnodgrass
  • πŸ‡ΊπŸ‡ΈUnited States dorficus

    I've been digging into this and I've found some very interesting things:

    I am using Lando for local dev with the Pantheon recipe and I've tried a couple of things.

    1. Creating a new Solr server with custom config automatically added
    2. "Posting" custom config to default Pantheon server

    Here are some strange findings:

    • When developing locally, using a custom server and index, I'm able to get synonyms to work.
    • When developing locally, using the Pantheon server and index, synonyms work with custom config.
    • When testing on Pantheon with default config, synonyms do not work, nor should they.
    • When testing on Pantheon with custom "posted" config, synonyms do not work, but they shouldn't.

    Here's where it gets strange:

    When testing on local, both custom and Pantheon servers, the testing at

    admin/config/search/search-api/server/pantheon_solr8/solr-admin/field-analysis
    

    revealed the following:

    However, testing the same way on Pantheon with a Pantheon server revealed this:

    The most interesting part is that the following happens, which I believe is related:

    I checked all of my schema files and the tokenizer should definitely be the StandardTokenizer

    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100" storeOffsetsWithPositions="true">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="accents_en.txt"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords_en.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" protected="protwords_en.txt" language="English"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="accents_en.txt"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms_en.txt" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" protected="protwords_en.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" protected="protwords_en.txt" language="English"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
    

    The part of note there is:


    solr.StandardTokenizerFactory"/>

    What this is telling me, in addition to the issues mentioned on https://www.drupal.org/project/search_api_pantheon/issues/3424724#commen... πŸ› Search API Schema reverts back to 4.2 after deploys or randomly Active is that there is another config file hiding somewhere that we do not have access to edit that is overriding some of the customizations. This is also evidenced by seeing Schema 4.3.3 on local and 4.2.0 on Pantheon.

    Without knowing if this is indeed the case, it's difficult to determine what the next steps to correct this are.

  • πŸ‡΅πŸ‡­Philippines danreb

    @dorficus You are correct. The default Solr schema in Pantheon platform was set to 4.2.10, currently reposting of Solr Schema on Pantheon was broken (Sticky Solr Schema bug)

    If you want your custom config to take affect, what you need to do right now is to open a ticket and let the CSE or the platform engineers reposted the config for you in the affected environments.

  • πŸ‡ΊπŸ‡ΈUnited States bsnodgrass

    @danreb I've created a support ticket assistance with making this happen or instructions as to how we can post the config.zip ourselves?

    Initially we would like to post the schema changes on transmfg.build multidev to confirm our issue is fixed.

    Following we will be making a number of changes on transmfg.build and have them applied to all our environments.

  • πŸ‡ΊπŸ‡ΈUnited States dorficus

    @bsnodgrass and @danreb I have verified that after the ticket with Pantheon, we are now using the correct schema in the multidev. After verifying this, I also verified that the correct tokenizer and synonym filters were working correctly.

    It still seems that for the "core" config which defines the schema, there will need to be Pantheon intervention on all environments to get it up to date, however we are able to post our own config using drush sapps, assuming that our custom config is available to the Drupal site.

    I have included the config in a folder in the docroot of the project, so that command to post config ends up being drush sapps pantheon_solr8 /code/solr/custom_config/.

    1. Key steps in this process are:
    2. Pushing the config to the platform via git
    3. Posting the config using the above command
    4. Reloading the Solr server core: admin/config/search/search-api/server/pantheon_solr8/solr-admin/reload-core
    5. Reindexing the content after the core has reloaded

    Once that is done, I was able to verify that the files were correct in the admin/config/search/search-api/server/pantheon_solr8/files
    I was also able to test the queries vs. index values using the Field Analysis tool: admin/config/search/search-api/server/pantheon_solr8/solr-admin/field-analysis

    Of note, synonyms with whitespaces do not work. Underscores did not seem to correct this either. In our use case of "ldpe" being returned for "low density polyethylene", Solr read the latter as three separate tokens, non-combined. Within our synonyms_en.txt file I had set the terms to be interchangeable, ldpe, low density polyethylene, however the query "low density polyethylene" against a field value of "ldpe" did not return results.

    To fix this, I escaped the whitespaces with low\ density\ polyethylene, redid the above steps to update the config, and the results appeared as expected.

    Thank you for your help on this, @danreb.

  • Assigned to danreb
  • Status changed to Fixed 6 days ago
  • πŸ‡΅πŸ‡­Philippines danreb

    Resolving this, Pantheon added a new page related for custom configuration here https://docs.pantheon.io/guides/solr-drupal/custom-config

    Pantheon user also don't need to create a support ticket when posting Solr Schema as it is already been fixed in the version 8.2.1

Production build 0.71.5 2024