- Issue created by @kaszarobert
- 🇩🇪Germany mkalkbrenner 🇩🇪
You could "easily" modify the field configuration and add a search_api_solr.solr_field_type.text_ngram_hu_7_0_0.yml. It will be used automatically.
And even better, contribute it.
I try to describe our developer experience when building a search page for multiple languages:
I looked through what could happen and I found out that when I use the "Fulltext" type, then in Solr it will be "tm_X3b_hu_title" with type of "text_hu" and that's defined in search_api_solr/config/optional/search_api_solr.solr_field_type.text_hu_7_0_0.yml
analyzers:
-
type: index
charFilters:
-
class: solr.MappingCharFilterFactory
mapping: accents_hu.txt
That means for "Fulltext", the accents are removed, so when someone searches for "mezo", then "mező" will be a valid result.
But when I change the type to "Fulltext (ngram)", then it becomes "tcngramm_X3b_hu_title" with type of "text_ngram". And according to search_api_solr/config/install/search_api_solr.solr_field_type.text_ngram_und_7_0_0.yml
analyzers:
-
type: index
charFilters:
-
class: solr.MappingCharFilterFactory
mapping: accents_und.txt
That means for "Fulltext (ngram)", only the accents defined in accents_und.txt are removed, so when someone searches for "mezo", then "mező" will not be a valid result, since "ő -> o" is missing from accents_und.txt. And the previous "hungaria" search worked because "á -> a" is in accents_und.txt right now.
I saw that most languages define a language specific versions for these fields mostly:
- text_LANGCODE
- text_unstemmed_LANGCODE
- text_phonetic_LANGCODE
- collated_LANGCODE
That means if someone wants to use "Fulltext (ngram)", "Fulltext (ngramstring)", "Fulltext (edge)", "Fulltext (edgestring)" etc. they will experience the same that because of a missing language specific type, the accents or stopwords could cause this behavior in search for language native speakers.
The question is do we have to fill the missing types for each languages manually? Or should we add more accents to this accents_und.txt?
Active
4.0
Code
You could "easily" modify the field configuration and add a search_api_solr.solr_field_type.text_ngram_hu_7_0_0.yml. It will be used automatically.
And even better, contribute it.