Inconsistencies in database server and search keys processing

Created on 11 February 2025, 3 months ago

Problem/Motivation

There is inconsistency between the way the data is stored in the database server and the way search keys work. For example, if you have a full-text field for a product with an SKU of "FOO-012345", with stemming and tokenization index preprocessors activated, it will be saved in the database as follows:

+--------------------------------+--------------+------------+---+
| entity:commerce_product/1:en   | sku_fulltext | 12345      | 0 |
| entity:commerce_product/1:en   | sku_fulltext | foo        | 0 |
| entity:commerce_product/1:en   | sku_fulltext | foo 12345  | 0 |
+--------------------------------+--------------+------------+---+

But when you try to search for the keys "FOO-012345", it is parsed like this:

Index: search\n
Keys: 'FOO-012345'\n
Parsed keys: 'foo 012345'\n
Searched fields: [ALL]\n

The query doesn't know about any special processing done by the database server. This is done in the Drupal\search_api_db\Plugin\search_api\backend\Database::cleanNumericString() method. Basically, this method alters the data when adding new data to the database, breaking consistency with the original data. As a result, the query will not be able to find the product.

Steps to reproduce

  • Create a product with an SKU such as "FOO-012345".
  • Enable Stemmer and Tokenizer index preprocessors.
  • Attempt to search for the product using the key "FOO-012345".
  • Observe that the product is not found in the search results.

Proposed Resolution

I’m not sure how to resolve this issue because the database server and key parsers appear to be completely disconnected. Removing the trimming could lead to issues, especially considering its intended purpose as mentioned in #2451745: Why trimming leading zeros in text and token types? β†’ .

Maybe possible solution is to introduce a hook or event for key parsing and implement it for the database server. This would modify the parsed keys from Parsed keys: 'foo 012345'\n to Parsed keys: 'foo 12345'\n, aligning them with the data in the database and ensuring accurate search results.

πŸ› Bug report
Status

Active

Version

1.37

Component

Database backend

Created by

πŸ‡·πŸ‡ΊRussia niklan Russia, Perm

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024