- Issue created by @ultimike
This seems an issue in the implementation of Drupal Embeddings Engine or Plugin, unrelated to Milvus. The Drupal code fails to create a collection in Milvus with vector dimension according to the user input. For example, the input is
“OpenAI | text-embedding-3-small" as the "Embeddings Engine", “Dimensions” is set to 1536
While the collection created in Milvus specifies vector field of 768 dimension. It sounds like the dimension is hard coded, or there is a bug in specifying that.
I didn’t find the code of Drupal\vdb_provider_milvus\Plugin\VdbProvider\MilvusProvider->insertIntoCollection() in GitHub. The code of Embeddings Engine or Plugin should reveal the root cause of the problem.
- Assigned to marcus_johansson
-
marcus_johansson →
committed 2695f93a on 1.0.x
Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
-
marcus_johansson →
committed 2695f93a on 1.0.x
-
marcus_johansson →
committed d360a6c6 on 1.0.x
Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
-
marcus_johansson →
committed d360a6c6 on 1.0.x
- 🇩🇪Germany marcus_johansson
@ultimike - there were one bug and one usability issue here, that could have caused this, where the bug is most likely the culprit for you.
Bug - Dimension size ajax loading
If you use a standard model for embeddings from OpenAI, Mistral, Fireworks AI the dimension size is know, so when you choose it in the select list on the Search API Backend form, it should automatically update that value for you via Ajax.
This got broken in dev, because we opened up the possibility to set manual values for providers like LM Studio or Ollama, where the dimension size is unknown for us. When we opened up that, the state system thought that any change to this was a manual change and kept the initial 768 value.
This has now been solved by this being disabled by default and having a check box that you have to check that says "Set Dimensions Manually", that changes so you can manually change the state of this value, but as long as you keep it unchecked and use common embeddings engines it will fill it in for you automaically correctly.
Im 99% certain this is what you ran into.
Usability - Allowing changes to embeddings engine
Currently we had the embeddings engine allowed to be changed after it was set. Since we do not rebuild the whole index on changing this, it would revert to a broken state. We figured now that if you want to actually use a new embeddings engine, you should create a new backend and remove the old one, so we have made this disabled after editing.
I don't think this was the issue, but this can also cause similar problems, so this has been fixed as part of this ticket as well.
I have pushed these changes to 1.0.x-dev release, feel free to test them if they work better. My suggestion is to remove any backend you have and create a new one.
I will keep this ticket open, until I have written regression functional test for this, since it worked in alpha6, but got broken here.
Thanks for reporting!
- Issue was unassigned.
- Status changed to Fixed
4 months ago 4:04am 3 September 2024 -
marcus_johansson →
committed d360a6c6 on 3456770-discuss-interface-suggestion
Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
-
marcus_johansson →
committed d360a6c6 on 3456770-discuss-interface-suggestion
-
marcus_johansson →
committed 2695f93a on 3456770-discuss-interface-suggestion
Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
-
marcus_johansson →
committed 2695f93a on 3456770-discuss-interface-suggestion
-
marcus_johansson →
committed d360a6c6 on aws-bedrock
Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
-
marcus_johansson →
committed d360a6c6 on aws-bedrock
-
marcus_johansson →
committed 2695f93a on aws-bedrock
Issue #3471259 by marcus_johansson: Vector dimension mismatch when using...
-
marcus_johansson →
committed 2695f93a on aws-bedrock
Automatically closed - issue fixed for 2 weeks with no activity.