- Issue created by @slashrsm
- Status changed to Needs review
10 months ago 4:38pm 7 September 2024 - ๐ธ๐ฎSlovenia slashrsm
Still work in progress, but the idexing side of things work.
- Status changed to Needs work
10 months ago 12:39am 9 September 2024 - ๐ฆ๐บAustralia kim.pepper ๐โโ๏ธ๐ฆ๐บSydney, Australia
Thanks Janez. Looks like a great start. Besides the linting errors, I can see there are still a lot of hard coded values, and we're missing tests. I know it's a complex area, but it would be good to have some basic docs on setup and links for further info.
- ๐ฆ๐บAustralia acbramley
Quick n dirty review for now as this is obviously still WIP - great to see movement in this area though!
- ๐ธ๐ฎSlovenia slashrsm
slashrsm โ changed the visibility of the branch query_side to hidden.
- ๐ธ๐ฎSlovenia slashrsm
After watching Driesnote and looking into AI module โ a bit I realized that we are basically re-implementing their provider plugins here. In order to avoid that I decided to depend on the AI module for providers. Updated MR assumes/uses โจ Provide embeddings vector size Active , which add vector size function that we rely on.
- ๐ฆ๐นAustria maximilianmikus
I was looking into adding OpenSearch as a vector database provider and I found this issue by chance. I was wondering if it wouldn't be better to put this functionality in its own provider module? I started a project just for that before I found this issue by chance.
- ๐บ๐ธUnited States damienmckenna NH, USA
FYI the separate provider module has been deprecated in favor of this issue, though the current MR doesn't apply against the 3.x branch.
- ๐ฆ๐บAustralia kim.pepper ๐โโ๏ธ๐ฆ๐บSydney, Australia
A recommended approach for vector indexing is an ingest pipeline. I wonder if this issue could be expanded to include support for that?
- ๐ฆ๐บAustralia kim.pepper ๐โโ๏ธ๐ฆ๐บSydney, Australia
Started work on a more integrated approach. At this stage all the MR does is set
index.knn = TRUE
when creating the index.In order to have knn enabled on an index, we need to set that option when creating the index. We can change it after.
This meant we needed to refactor the
addIndex()
method to not create then update settings, but to pass the settings at creation time. This refactoring could potentially be split out into a separate issue. - ๐ฆ๐บAustralia kim.pepper ๐โโ๏ธ๐ฆ๐บSydney, Australia
Ran into a bit of an issue with the pipelines. In order to have Opensearch generate the text embeddings, you need to specify text field to embedding field mappings when creating the pipeline. I don't think it would be easy to dynamically create a pipeline like this with search api.
I'm going to check out the https://www.drupal.org/project/ai_vdb_provider_opensearch โ module to see if the built-in AI Search would work.