Edge-nGram fields do not preserve the complete word

Created on 13 April 2023, about 2 years ago

Updated 23 April 2023, about 2 years ago

If a field is indexed with edge dGram the whole word or phrase is not kept as Gram.

For example if "disestablishmentarianism" is contained in a field then "dis" and "disestablishment" will find the result but "disestablishmentarianism" not.

Workaround:

Add the field a second time as text type

I don't know if this is epected or wanted as described here:
https://discuss.elastic.co/t/edgengram-filter-not-keeping-the-whole-word...

Although preserve_original is set to true on the word_delimiter filter, the
edgeNGram filter is applied afterwards, meaning that the analyzer will
compute n-grams for both the original word and the sub-words created by the
word_delimiter filter.

If you want to keep the original token, it might make sense to use a multi
field[1] and analyze your field once with word_delimiter and once with
n-grams. Then you can use any of those fields depending on what you are
trying to achieve : prefix search with the field analyzed with edgeNGram
and standard search with the field analyzed with word_delimiter.

🐛 Bug report

Status

Postponed: needs info

Version

2.0

Component

Code

Created by

🇨🇭Switzerland yobottehg Basel

Live updates comments and jobs are added and updated live.

Comments & Activities

Issue created by @yobottehg
Comment about 2 years ago →
🇦🇺Australia kim.pepper 🏄‍♂️🇦🇺Sydney, Australia
Have you tried using the search_as_you_type data type? ✨ Add a search_as_you_type data type Fixed

OpenSearch has a dedicated search_as_you_type field type that is optimized for search-as-you-type functionality and can match terms using both prefix and infix completion. The search_as_you_type field does not require you to set up a custom analyzer or index suggestions beforehand.

https://opensearch.org/docs/latest/search-plugins/searching-data/autocom...

I feel this is more appropriate for most situations than custom workarounds for the edge-n-gram analyser.

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024