Long node titles can trigger exception 'The minimum overlap cannot be equal to or exceed the maximum chunk size.' when indexing content

Created on 22 April 2025, 17 days ago

Problem/Motivation

Sometimes, nodes with long titles (more than 160 characters or so) crash the indexing process with the error: The minimum overlap cannot be equal to or exceed the maximum chunk size.
This appears to be due to the EmbeddingBase chunking both the contextual content and the main fields. When the contextual content is chunked, the chunk size is calculated using the title length. When the title is long, it can result in a calculated chunk size that is smaller than the configured minimum overlap.

Steps to reproduce

  • Set up a search index with embeddings as described in (YouTube) Drupal AI Module Alpha 6 Update - AI Search (pt 3/3)
  • Configure the index fields with Rendered HTML output for the Main content and Title for the Contextual content
  • Set the search server config:
    • Maximum chunk size: 500
    • Minimum chunk overlap: 100
    • Contextual content maximum percentage: 30%
  • Create a node with a long title (160+ characters)
  • Run the index process

Proposed resolution

Reduce the minimum overlap for contextual chunks using the Contextual content maximum percentage. This has the added benefit of significantly reducing the number of chunks generated per node. Merge request incoming.

πŸ› Bug report
Status

Active

Version

1.0

Component

AI Search

Created by

πŸ‡ΊπŸ‡ΈUnited States CoffeyMachine

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024