When chunking content for AI Search (chunking being fundamental for vector databases), sometimes an item is so long that it cannot be indexed in a single go and needs have multiple runs in the batch to complete it. This is typically worked around by e.g. using LLM to shorter content (high token usage and cost) or having much larger chunk size (less representative vectors, so less accuracy).
We have attempted with subbatch here 📌 Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes Needs work at first but the UX for subbatch is very confusing, the end user sees progress bar after progress bar and has no real concept of when it'll finish in a big site.
If we could override the ::create() method of the service, we could extend it to have our own implementation that modifies the run so e.g. when indexing 10 items, if item number 4 needs 2 runs, we can then have it hit that again.
Described above
Change IndexBatchHelper to a service so it can be overridden by AI Search module (submodule of AI module)
MR
Needs review
1.0
General code
Not all content is available!
It's likely this issue predates Contrib.social: some issue and comment data are missing.
No activities found.