Title is duplicated in chunk if present in the index as contextual content

Created on 17 September 2025, about 2 months ago

Problem/Motivation

In order to have more metadata on the chunk of content that is being converted to embeddings and later stored in vector database, the title of the entity is added to each chunk as a headline 1. This happens of course only when title (entity label field) is present in the index fields and is not "Ignored". But this also leads to the duplication of the title, as it is later added to the end of the chunk as contextual content with label of the field and its value.

Steps to reproduce (required for bugs, but not feature requests)

Install ai_search module

Install any VDB provider and configure it
Create a Search API server with AI Search backend and use the VDB provider installed on previous step.
Create an index for "Content" entity type.
Add "Rendered HTML output" field as "Main content" and "Title" field as "Contextual content".
Index at least 1 item
Check the chunk, it will contain the content title as h1 ('#') in the beginning of the chunk and also `Title: here is the title of the content` in the end.

Proposed resolution

Do not include the title in the end of the chunk, as the title is already added in the beginning. This will save some tokens and will allow to use them more efficiently.
Alternatively, do not add title as h1 but leave the title in the chunk metadata.

Remaining tasks

Decide on the approach, create MR

🐛 Bug report

Status

Needs review

Version

1.2

Component

AI Search

Created by

🇩🇪Germany a.dmitriiev

Live updates comments and jobs are added and updated live.

Incomplete comments

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

No activities found.

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024