- Issue created by @bsnodgrass
- πΊπΈUnited States dorficus
I've been digging into this and I've found some very interesting things:
I am using Lando for local dev with the Pantheon recipe and I've tried a couple of things.
- Creating a new Solr server with custom config automatically added
- "Posting" custom config to default Pantheon server
Here are some strange findings:
- When developing locally, using a custom server and index, I'm able to get synonyms to work.
- When developing locally, using the Pantheon server and index, synonyms work with custom config.
- When testing on Pantheon with default config, synonyms do not work, nor should they.
- When testing on Pantheon with custom "posted" config, synonyms do not work, but they shouldn't.
Here's where it gets strange:
When testing on local, both custom and Pantheon servers, the testing at
admin/config/search/search-api/server/pantheon_solr8/solr-admin/field-analysis
revealed the following:
However, testing the same way on Pantheon with a Pantheon server revealed this:
The most interesting part is that the following happens, which I believe is related:
I checked all of my schema files and the tokenizer should definitely be the StandardTokenizer
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100" storeOffsetsWithPositions="true"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory" mapping="accents_en.txt"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords_en.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="1"/> <filter class="solr.LengthFilterFactory" min="2" max="100"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" protected="protwords_en.txt" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <charFilter class="solr.MappingCharFilterFactory" mapping="accents_en.txt"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms_en.txt" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" protected="protwords_en.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="0"/> <filter class="solr.LengthFilterFactory" min="2" max="100"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" protected="protwords_en.txt" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
The part of note there is:
solr.StandardTokenizerFactory"/>
What this is telling me, in addition to the issues mentioned on https://www.drupal.org/project/search_api_pantheon/issues/3424724#commen... π Search API Schema reverts back to 4.2 after deploys or randomly Active is that there is another config file hiding somewhere that we do not have access to edit that is overriding some of the customizations. This is also evidenced by seeing Schema 4.3.3 on local and 4.2.0 on Pantheon.
Without knowing if this is indeed the case, it's difficult to determine what the next steps to correct this are.
- π΅πPhilippines danreb
@dorficus You are correct. The default Solr schema in Pantheon platform was set to 4.2.10, currently reposting of Solr Schema on Pantheon was broken (Sticky Solr Schema bug)
If you want your custom config to take affect, what you need to do right now is to open a ticket and let the CSE or the platform engineers reposted the config for you in the affected environments.
- πΊπΈUnited States bsnodgrass
@danreb I've created a support ticket assistance with making this happen or instructions as to how we can post the config.zip ourselves?
Initially we would like to post the schema changes on transmfg.build multidev to confirm our issue is fixed.
Following we will be making a number of changes on transmfg.build and have them applied to all our environments.
- πΊπΈUnited States dorficus
@bsnodgrass and @danreb I have verified that after the ticket with Pantheon, we are now using the correct schema in the multidev. After verifying this, I also verified that the correct tokenizer and synonym filters were working correctly.
It still seems that for the "core" config which defines the schema, there will need to be Pantheon intervention on all environments to get it up to date, however we are able to post our own config using
drush sapps
, assuming that our custom config is available to the Drupal site.I have included the config in a folder in the docroot of the project, so that command to post config ends up being
drush sapps pantheon_solr8 /code/solr/custom_config/
.- Key steps in this process are:
- Pushing the config to the platform via git
- Posting the config using the above command
- Reloading the Solr server core:
admin/config/search/search-api/server/pantheon_solr8/solr-admin/reload-core
- Reindexing the content after the core has reloaded
Once that is done, I was able to verify that the files were correct in the
admin/config/search/search-api/server/pantheon_solr8/files
I was also able to test the queries vs. index values using the Field Analysis tool:admin/config/search/search-api/server/pantheon_solr8/solr-admin/field-analysis
Of note, synonyms with whitespaces do not work. Underscores did not seem to correct this either. In our use case of "ldpe" being returned for "low density polyethylene", Solr read the latter as three separate tokens, non-combined. Within our synonyms_en.txt file I had set the terms to be interchangeable,
ldpe, low density polyethylene
, however the query "low density polyethylene" against a field value of "ldpe" did not return results.To fix this, I escaped the whitespaces with
low\ density\ polyethylene
, redid the above steps to update the config, and the results appeared as expected.Thank you for your help on this, @danreb.