Vector dimension mismatch when using Zilliz

Created on 30 August 2024, 4 months ago
Updated 12 September 2024, 3 months ago

Problem/Motivation

I’m trying to follow along with https://www.youtube.com/watch?v=ZpLxpp9BJ1c in setting up my first AI Search test, and am running into an issue…

Just like the video, I’m using Zilliz (Milvus 2.4+) as well as the current -dev version of the AI module.

Regardless of the “Embeddings Engine” I choose in my Search API server config, I cannot modify the value of “Dimensions” in “Embeddings Engine Configuration”. If I change it and submit the form, it just reverts to the original value.

When using “OpenAI | text-embedding-3-small" as the "Embeddings Engine", “Dimensions” is set to 1536 and the error message is:

Exception: Failed to insert into collection: fail to deal the insert data, error: []float32 size 1536 doesn't equal to vector dimension 768 of FloatVector in Drupal\vdb_provider_milvus\Plugin\VdbProvider\MilvusProvider->insertIntoCollection() (line 187 of /var/www/html/web/modules/contrib/ai/modules/vdb_providers/vdb_provider_milvus/src/Plugin/VdbProvider/MilvusProvider.php).

Steps to reproduce

Drupal 11, latest -dev version of AI module, following instructions from video: https://www.youtube.com/watch?v=ZpLxpp9BJ1c

🐛 Bug report
Status

Fixed

Version

1.0

Component

AI Search

Created by

🇺🇸United States ultimike Florida, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @ultimike
  • This seems an issue in the implementation of Drupal Embeddings Engine or Plugin, unrelated to Milvus. The Drupal code fails to create a collection in Milvus with vector dimension according to the user input. For example, the input is

    “OpenAI | text-embedding-3-small" as the "Embeddings Engine", “Dimensions” is set to 1536

    While the collection created in Milvus specifies vector field of 768 dimension. It sounds like the dimension is hard coded, or there is a bug in specifying that.

    I didn’t find the code of Drupal\vdb_provider_milvus\Plugin\VdbProvider\MilvusProvider->insertIntoCollection() in GitHub. The code of Embeddings Engine or Plugin should reveal the root cause of the problem.

  • Assigned to marcus_johansson
  • 🇩🇪Germany marcus_johansson

    @ultimike - there were one bug and one usability issue here, that could have caused this, where the bug is most likely the culprit for you.

    Bug - Dimension size ajax loading

    If you use a standard model for embeddings from OpenAI, Mistral, Fireworks AI the dimension size is know, so when you choose it in the select list on the Search API Backend form, it should automatically update that value for you via Ajax.

    This got broken in dev, because we opened up the possibility to set manual values for providers like LM Studio or Ollama, where the dimension size is unknown for us. When we opened up that, the state system thought that any change to this was a manual change and kept the initial 768 value.

    This has now been solved by this being disabled by default and having a check box that you have to check that says "Set Dimensions Manually", that changes so you can manually change the state of this value, but as long as you keep it unchecked and use common embeddings engines it will fill it in for you automaically correctly.

    Im 99% certain this is what you ran into.

    Usability - Allowing changes to embeddings engine

    Currently we had the embeddings engine allowed to be changed after it was set. Since we do not rebuild the whole index on changing this, it would revert to a broken state. We figured now that if you want to actually use a new embeddings engine, you should create a new backend and remove the old one, so we have made this disabled after editing.

    I don't think this was the issue, but this can also cause similar problems, so this has been fixed as part of this ticket as well.

    I have pushed these changes to 1.0.x-dev release, feel free to test them if they work better. My suggestion is to remove any backend you have and create a new one.

    I will keep this ticket open, until I have written regression functional test for this, since it worked in alpha6, but got broken here.

    Thanks for reporting!

  • Issue was unassigned.
  • Status changed to Fixed 4 months ago
  • 🇬🇧United Kingdom scott_euser

    Thanks for sorting it Marcus!

    • marcus_johansson committed d360a6c6 on 3456770-discuss-interface-suggestion
      Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
    • marcus_johansson committed 2695f93a on 3456770-discuss-interface-suggestion
      Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024