SQLSTATE Duplicate entry warning when there's a 49 symbols word

Created on 20 November 2023, about 1 year ago
Updated 23 December 2023, 11 months ago

Problem/Motivation

If you use the Database Search module and try to index entities that contain words of 49 symbols you will receive this warning

SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'entity:node/9983:en-body-Loremipsumdolorsitametconsecteturadi...' for key 'PRIMARY': INSERT INTO "search_api_db_article_text" ("item_id", "field_name", "word", "score") 

\Drupal\search_api_db\Plugin\search_api\backend\Database::convertValuesToScoredTokens can return us an array of tokens like this

[
  'Loremipsumdolorsitametconsecteturadipiscingelitfe' => [
    'value' => 'Loremipsumdolorsitametconsecteturadipiscingelitfe',
    'score' => 1.0,
  ],
  'Loremipsumdolorsitametconsecteturadipiscingelitfe ' => [
    'value' => 'Loremipsumdolorsitametconsecteturadipiscingelitfe ',
    'score' => 1.0,
  ],
]

and it happens because of these lines

        $bigram = "$prev_word $word";
        $bigram = mb_substr($bigram, 0, static::TOKEN_LENGTH_MAX);

Steps to reproduce

  1. intsall the search_api_db module
  2. create index for article node type
  3. add to body field to index
  4. create an article node with the body that contains the word of 49 symbols
  5. run indexing
  6. you will see the "Couldn't index items. Check the logs for details." error message and have a warning in the watchdog like this
πŸ› Bug report
Status

Closed: duplicate

Version

1.30

Component

Database backend

Created by

πŸ‡ΊπŸ‡¦Ukraine Chizh273

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024