Highlight Search API processor is incompatible with chunked results (RagTool)

Created on 8 October 2025, 27 days ago

Problem/Motivation

AI version 1.1.4 includes this bugfix here [3546744]

However, updating to 1.1.4 or beyond on both of our sites, where we are using an AI Chatbot with RAG integration (Milvus Provider with Zilliz), the RAG functionality was broken.

Errors in log like:

Could not load the following items on index AI Search: "entity:node/11:de:460601774751947103".

After a helpful conversation with @artem in Slack, I've finally found the problem. The embedding strategy, as first assumed, did not have an impact at all. While I was using the "contextual_chunks" strategy, leading to extending the search API item IDs by the chunk IDs from the Milvus database (eg "entity:node/11:de" is the normal Search API item ID, ":460601774751947103" is the extension containing the Milvus ID... using the "average_pool" strategy would have ended in simply having the item ID extended by ":0", in any case it would not make a difference at all

I'm still not experienced in configuring AI search indizes, so I had to learn that most of the processors that are useful for a Search API database search backend, does not make sense for a AI vector database index. However, in both of our setups I've left the "highlight" processor active, because I thought, while it might not have an impact at all, it should at least not harm...
well, it did harm: the Highlight processor is calling getFieldsHelper()->extractItemValues(), and a few calls later, \Drupal\search_api\Item\Item::getOriginalObject() is called -> and $this->index->loadItem($this->itemId) is causing the problem. No matter what embedding strategy is used, the item ID is not expected to have another suffix here

Steps to reproduce (required for bugs, but not feature requests)

* ai_search >= 1.1.4
* ai_vdb_provider_milvus (but it's not specifically related to Milvus at all)
* create a search index for your vector database and enable the "highlight" processor
* configure an AI chatbot with RAG tool connected to that search index
* ask the bot anything and you won't get specific RAG powered results
* instead look at the watchdog log and you'll find warnings like the one I've mentioned above

Proposed resolution

I have no idea, whether we should just disallow the "highlight" processor at all, because it might make never sense at all to use it(?), or find a real solution. imho the enrichment of the item IDs could bite us at any other place again. So I guess, we might have to think of extending the search_api's Item object in some way, to keep its original item ID, like it is stored in the Drupal database, and providing the possibility to add a chunk ID... so any place that is refering to the full item ID including chunk ID, needs to call a new getter, and any other plugin in the search_api universe that is expecting the original item ID (and loading the item) will continue to work in any scenario, even if its usage does not make sense (eg highlighting for RAG), but at least it should not break

🐛 Bug report
Status

Active

Version

1.1

Component

AI Search

Created by

🇦🇹Austria agoradesign Grieskirchen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024