Does not work on mono-lingual sites (when content is language neutral)

Created on 6 October 2023, about 1 year ago

Scenario:
I want to use Porter-Stemmer so that "defend" and "defender" will return Node A, whose title contains the word "defenders", in the results.
I have a view with "keys" exposed filter, based on a node search page. I have a node (e.g. Node A with nid 1) with the word "defenders".
I enabled the module, cleared the search index, and fully rebuilt search.

Prior to enabling porterstemmer:
* The search index contains the keyword "defenders".
* When I search for "defenders" I get Node A in the results.
* When I search for the word "defend" or "defender", I get no results.

After enabling porterstemmer:
* The search index (ie. the entries returned from SELECT * FROM search_index WHERE sid = 1) is identical to what it was prior to installing porterstemmer
* When I search for "defend" or "defender", Node A is not in the results
* When I search for "defenders", Node A is also not in the results

Things i tried:
* confirmed that module is enabled
* confirmed that search index was rebuilt from scratch
* re-saved the node

Additional detail:
When I use views preview to examine the search query, I see that porterstemmer is successfully transforming the search key "defenders" into "defend". But since the search_index only contains "defenders", there is no match. Seems like the module fails to inject the stemmed keyword into the index, but why? What am I doing wrong here?

πŸ› Bug report
Status

Needs review

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @AaronBauman
  • Assigned to AaronBauman
  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    Doing some debugging - looks like porterstemmer is not running during search index, because the langcode is set to 'und' instead of 'en'.
    However, when the preprocess method is called during serach, the langcode is set to 'null', so porterstemmer does get invoked.
    This breaks the search results.

    Not sure if this is a Drupal core bug, or porterstemmer bug, but it's definitely a bug.

    I'm not using any multilingual / i18n / or anything - it's only English, which i imagine is a pretty common setup.

    Opening a MR to add 'und' handling into the processor, but not sure if that's universally appropriate or what.

  • @aaronbauman opened merge request.
  • Status changed to Needs review about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia
  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    Here's an updated test that demonstrates the current bugginess around search for language-unspecified and non-english language results.
    Seems like the existing test is quite insufficient, or at least I don't understand how it's supposed to work.

    This problem appears to be more pervasive than the current title implies, but i'm not gonna keep changing the title.

Production build 0.71.5 2024