- Issue created by @yautja_cetanu
- πΊπΈUnited States kevinquillen
MySQL 9 is shipping with a Vector type as well:
- π§πͺBelgium wouters_f Leuven
here's the "magic quadrant" of vector databases.
Personally I think good options are (and why):
- Postrgres (because its popular)
- Elasticsearch (or opensearch, should be very similar, also popular)
- Mysql (did not know it supported vectors)
- Pinecone (already existing)
- Qdrant (docker container, easy to setup)
- azure ai search (some modules already exist)Things often used
- Vespa (Danswer)I'm not feeling it for
- Cassandra
- Redis
- othersBut that's just gut feeling.
- π§πͺBelgium wouters_f Leuven
It would make sense to
support
- a dedicated vector DB
- a db that also supports vector DB
(so something from each column). - πΊπΈUnited States kevinquillen
MySQL 9 is a brand new release, though I think it is worthwhile seeing what the LOE is to get that plugin going. If its not too much, that would be a big draw for Drupal without a lot of need for third party. Then advanced users could use Qdrant, Pinecone etc if they have that need. But MySQL support (even though new) would be huge, even this early on. Lets not forget that Solr has vector support as well. But that said, you'd need Solr.
- π§πͺBelgium wouters_f Leuven
I did also not know solr supported it too.
I agree with Kevin.
So in terms of current usage and adoption:1. Obviously most are using mysql and that would make most sense (in the long term).
2. Many Drupals are already connected to SOLR for search. so that would also certainly make it cheaper for them to onboard onto AI.
3. A lot (but less I think) are connected to Elasticsearch for search
3. A dedicated Vector DB also obviously makes sense. - πΊπΈUnited States kevinquillen
- π§πͺBelgium wouters_f Leuven
I've tested saving vectors in Azure AI in this branch, that seems to work:
This code does not use the AI module for credentials (yet )
https://www.drupal.org/project/search_api_aais/issues/3461513#comment-15... β¨ Module can not index vectors Needs work - π§πͺBelgium wouters_f Leuven
I've tested the https://www.drupal.org/project/elasticsearch_connector and seen that it should not be too hard to add vector support.
Just linking it here so people that want to work on it can easily find it:
https://www.drupal.org/project/elasticsearch_connector/issues/3461724 β¨ Storing and using vectors (Embeddings) in ES is not supported yet. Active - π§πͺBelgium wouters_f Leuven
Here's a chroma imeplementation.
* Drupal module: https://lnkd.in/dtWcWcze
* FastAPI RAG backend: https://lnkd.in/dF3MK8uv - π³π±Netherlands jurriaanroelofs
How about Weaviate? Has a lot of advanced features, some paid, but also a fully open source vector search engine with a lot of community backing:
https://github.com/weaviate/weaviate
I also see one of the AI module maintainers has created a module for weaviate: https://www.drupal.org/project/search_api_weaviate β
- π΅πͺPeru DevElCuy
Postgres' pg_vector doesn't seem suitable for production because it is 150 times slower than a dedicated vector database like Qdrant [1].
Weaviate is great when dealing with complex data, which is applicable to many common Drupal use cases. However, Qdrant is significantly faster than Weaviate [2]. Both have somewhat unusual open-source licenses but are still permissive.
I have experience with Qdrant already, and it works great so far, but Weaviate looks better in the long run because I need to handle larger and more complex data sets.
[1] https://nirantk.com/writing/pgvector-vs-qdrant
[2] https://myscale.com/blog/weaviate-vs-qdrant-feature-comparison-database-... - πΊπΈUnited States adanielyan
Please add support for Weaviate. It's an open source vector database that supports multitenancy and hybrid search (keyword/vector).
https://weaviate.io/ - π¬π§United Kingdom scott_euser
Just to chip in here, for managed hosting platforms options are more limited (e.g. Pantheon) so external vector databases are required (while mysql 9 is not available). Opensolr and similar managed solr platforms seem like good options, but in the dedicated vector database managed services realm:
- Pinecone free tier does seem to be the highest free option with 2gb (and other restrictions though) and allows production
- Qdrant is close and 1gb free tier allowing production probably covers many use cases
- Zilliz Cloud (Milvus) free tier says not for production (though not sure if that's a licence thing or just a recommendation)
To note I have a client using Pinecone via openai_embeddings module and plan to upgrade then to ai module mid to late September though Vivek Panicker might get to that earlier (I'll check on his progress when I'm close to starting). I think prior to Pinecone's "serverless" offering they were a bit of a moving target but since then personally haven't had any issues.
- π¬π§United Kingdom scott_euser
Pinecone: Okay Vivek has made good progress on it and will be creating a merge request + issue soon. I'll review asap
- π¬π§United Kingdom scott_euser
Updated issue summary to reflect decision on Pinecone. We discussed as maintainers via slack subsequently since this issue was originally created.
Automatically closed - issue fixed for 2 weeks with no activity.