Make IndexBatchHelper a service so it can be overridden

Created on 9 October 2025, 23 days ago

Updated 16 October 2025, 16 days ago

Problem/Motivation

When chunking content for AI Search (chunking being fundamental for vector databases), sometimes an item is so long that it cannot be indexed in a single go and needs have multiple runs in the batch to complete it. This is typically worked around by e.g. using LLM to shorter content (high token usage and cost) or having much larger chunk size (less representative vectors, so less accuracy).

We have attempted with subbatch here 📌 Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes Needs work at first but the UX for subbatch is very confusing, the end user sees progress bar after progress bar and has no real concept of when it'll finish in a big site.

If we could override the ::create() method of the service, we could extend it to have our own implementation that modifies the run so e.g. when indexing 10 items, if item number 4 needs 2 runs, we can then have it hit that again.

Steps to reproduce

Described above

Proposed resolution

Change IndexBatchHelper to a service so it can be overridden by AI Search module (submodule of AI module)

Remaining tasks

✨ Feature request

Status

Needs review

Version

1.0

Component

General code

Created by

🇬🇧United Kingdom scott_euser

Live updates comments and jobs are added and updated live.

Incomplete comments

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

No activities found.

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024