Edge-nGram fields do not preserve the complete word

Created on 13 April 2023, about 1 year ago
Updated 23 April 2023, about 1 year ago

If a field is indexed with edge dGram the whole word or phrase is not kept as Gram.

For example if "disestablishmentarianism" is contained in a field then "dis" and "disestablishment" will find the result but "disestablishmentarianism" not.

Workaround:

Add the field a second time as text type

I don't know if this is epected or wanted as described here:
https://discuss.elastic.co/t/edgengram-filter-not-keeping-the-whole-word...

Although preserve_original is set to true on the word_delimiter filter, the
edgeNGram filter is applied afterwards, meaning that the analyzer will
compute n-grams for both the original word and the sub-words created by the
word_delimiter filter.

If you want to keep the original token, it might make sense to use a multi
field[1] and analyze your field once with word_delimiter and once with
n-grams. Then you can use any of those fields depending on what you are
trying to achieve : prefix search with the field analyzed with edgeNGram
and standard search with the field analyzed with word_delimiter.

πŸ› Bug report
Status

Postponed: needs info

Version

2.0

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany yobottehg

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.69.0 2024