Problem/Motivation
Attempt to index field with content longer than 65536 (including vector size and all metadata fields) attracts API response code 1100.
Proposed resolution
The issue is a bit complicated by the fact, that it is not caught by Milvus driver, but only a API call, and we do not know how exactly the length is calculated there.
We can proactively trim the content field, but I suggest we catch 1100 code in vdb_provider_milvus/src/Plugin/VdbProvider/MilvusProvider.php
, method insertIntoCollection()
, trim the field based on lengths all the fields and vector, and try again. I suggest something like this:
public function insertIntoCollection(
string $collection_name,
array $data,
string $database = 'default',
): void {
$processed = FALSE;
while (!$processed) {
$response = json_decode($this->getClient()->vector()->insert(
collectionName: $collection_name,
data: $data,
dbName: $database,
), TRUE);
if (!isset($response['code'])) {
throw new \Exception("Failed to record vector.");
}
switch ($response['code']) {
case 1100:
$this->sanitizeMaxLength($data);
break;
case 200:
$processed = TRUE;
break;
default:
throw new \Exception("Failed to record vector.");
}
}
}
And then:
/**
* Trim the data.
*
* @throws \Exception
*/
private function sanitizeMaxLength(&$data): void {
// Nothing to do, if we do not have content field or it is empty.
if (!isset($data['content']) || (strlen($data['content']) == 0)) {
throw new \Exception("Failed to record vector.");
}
$total_length = $this->countLength($data);
// If the content is too long, shorten the content by a calculated value.
if ($total_length > 65536) {
$difference = 65536 - $total_length;
}
// If the calculated content is shorter, but API still reports the issue
// shorten the content by additional 5%.
else {
$difference = -max(1, (int) (strlen($data['content']) * 0.05));
}
$data['content'] = substr($data['content'], 0, $difference);
}
/**
* Calculate size of data.
*
* @param $data
*
* @return int
*/
private function countLength($data): int {
$total_length = 0;
foreach ($data as $key => $value) {
if ($key !== 'vector') {
$total_length += strlen((string) $value) + strlen($key) + 22;
} else {
$total_length += count($value) + 28;
}
}
return $total_length;
}