Improve the AI Search recursive retrieval of a specific quantity of results

Issue created by @scott_euser
Comment 2 months ago →
🇬🇧United Kingdom scott_euser
Comment 2 months ago →
🇬🇧United Kingdom scott_euser
Comment 2 months ago →
🇺🇸United States keiserjb
I altered the doSearch method to track distinct results. It seems to work for me, but I don't understand the implications of what I've done.
Comment about 2 months ago →
🇬🇧United Kingdom scott_euser
Can you put it in a merge request so it can be tested/reviewed?
Comment 25 days ago →
🇪🇸Spain gxleano Cáceres
Merge request !717Issue #3526390: Add advanced search capabilities with VDB grouping and exclusion filtering → (Open) created by gxleano
Comment 25 days ago →
🇪🇸Spain gxleano Cáceres
Changes include the logic to handle the limitations of recursive vector search in scenarios involving:

Large content split into many small chunks

Numerous access-controlled nodes

Insufficient retrieval due to 10-iteration (maxAccessRetries) cap

At the meantime, I've also added related changes in https://www.drupal.org/project/ai_vdb_provider_milvus/issues/3526393 ✨ Make use of Milvus' Grouping functionality Active .
Comment 25 days ago →
🇪🇸Spain gxleano Cáceres
Comment 19 days ago →
🇬🇧United Kingdom scott_euser
Thanks for all the work on this and apologies for the delay. Keen to hear other opinions as I have been struggling to focus on this lately, but my general feeling is that we are repeating a lot both in AiVdbProviderClientBase/Interface + in the SearchApiAiSearchBackend.php

Looking here and at ✨ Make use of Milvus' Grouping functionality Active it seems both exclusion and grouping are both small modifications but we are making separate methods and repeating I guess to avoid breaking change, but maybe we are adding a lot of complexity to avoid BC while we are still able to, and maybe coordinated release lesser of two evils.

Then knowing that a VDB supports Grouping only means that the SearchApiAiSearchBackend needs to skip the iteration attempts when chunks are not wanted && grouping supported. And the exclude_entity_ids just added as a filter param if supported. Not actually sure if we need the supportsNotInFiltering() because there are plenty of things not supported by filtering in VDBs and ultimately if it is supported the VDB will get through the interation quicker to get the desired number of results.

Then we should be able to avoid 3 separate doSearch methods which have a lot of repetition and instead the supportsGrouping can just be what stops iteration, and we can always attempt to apply exclusions regardless of whether iteration happens or not. So hopefully results in a lot less code change in SearchApiAiSearchBackend.

I say this all without actually properly trying it, as perhaps it has been tried and its not doable. But in any case, first step I think is agreement together how much breaking change we are okay wtih
Comment 19 days ago →
🇩🇪Germany marcus_johansson
@scott_euser - regarding BC, AI Search is the one that is ok to do it to since its still Experimental. The other option is to do it in 2.0.0.
Status changed to Needs review 3 days ago7:47pm 25 July 2025

🇺🇸United States keiserjb

Here is my altered doSearch that does the trick for me.

protected function doSearch(QueryInterface $query, $params, $bypass_access, &$results, $start_limit, $start_offset, $iteration = 0, &$unique_entity_ids = []) {
  $params['database'] = $this->configuration['database_settings']['database_name'];
  $params['collection_name'] = $this->configuration['database_settings']['collection'];

  // Conduct the search.
  if (!$bypass_access) {
    // Fetch more to account for deduplication.
    $params['limit'] = $start_limit * 5;
    $params['offset'] = $start_offset + ($iteration * $start_limit * 5);
  }

  $search_words = $query->getKeys();
  if (!empty($search_words)) {
    [$provider_id, $model_id] = explode('__', $this->configuration['embeddings_engine']);
    $embedding_llm = $this->aiProviderManager->createInstance($provider_id);

    if (!isset($params['vector_input'])) {
      if (is_array($search_words)) {
        unset($search_words['#conjunction']);
        $search_words = implode(' ', $search_words);
      }
      $input = new EmbeddingsInput($search_words);
      $params['vector_input'] = $embedding_llm->embeddings($input, $model_id)->getNormalized();
    }

    $params['query'] = $query;
    $response = $this->getClient()->vectorSearch(...$params);
  }
  else {
    $response = $this->getClient()->querySearch(...$params);
  }

  $i = 0;
  foreach ($response as $match) {
    if (is_object($match)) {
      $match = (array) $match;
    }
    $i++;

    $entity_id = $match['drupal_entity_id'];

    // Access check.
    if (!$bypass_access && !$this->checkEntityAccess($entity_id)) {
      continue;
    }

    // Only count distinct entities.
    if (!isset($unique_entity_ids[$entity_id])) {
      $unique_entity_ids[$entity_id] = TRUE;
    }

    $results[] = $match;

    if (count($unique_entity_ids) >= $start_limit) {
      return [
        'real_offset' => $start_offset + ($iteration * $start_limit * 5) + $i,
        'reason' => 'distinct_entity_limit',
        'vector_score' => $match['distance'] ?? 0,
      ];
    }
  }

  if ($iteration == $this->maxAccessRetries) {
    return [
      'real_offset' => $iteration * $start_limit * 5 + $i,
      'reason' => 'max_retries',
      'vector_score' => $match['distance'] ?? 0,
    ];
  }

  if (count($response) < $params['limit']) {
    return [
      'real_offset' => $iteration * $start_limit * 5 + $i,
      'reason' => 'reached_end',
      'vector_score' => $match['distance'] ?? 0,
    ];
  }

  // Recurse to next batch.
  return $this->doSearch($query, $params, $bypass_access, $results, $start_limit, $start_offset, $iteration + 1, $unique_entity_ids);
}

Improve the AI Search recursive retrieval of a specific quantity of results

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!717Improve the AI Search recursive retrieval of a specific quantity of results
Open

Comments & Activities

Improve the AI Search recursive retrieval of a specific quantity of results

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!717Improve the AI Search recursive retrieval of a specific quantity of resultsOpen

Comments & Activities

!717Improve the AI Search recursive retrieval of a specific quantity of results
Open