Add support for neural search (text embeddings)

Created on 7 September 2024, 10 months ago

Problem/Motivation

OpenSearch supports neural search using text embeddings: https://opensearch.org/docs/latest/search-plugins/neural-search/

The purpose of this issue is to add neural search capabilities to this module.

Proposed resolution

OpenSearch can generate text embeddings for you, but we decided to do it on the Drupal side instead. The main reason is to allow us to use AI models that are not supported by OpenSearch.

Remaining tasks

- Create neural backend plugin type
- Implement ollama backend plugin
- Implement OpenAI backend plugin
- Generate text embeddings during indexing and add the vector to the indexed data
- Create views filter for neural search (which generates the vector for the search query and queries OpenSearch based on that)
- Create configuration UIs

โœจ Feature request
Status

Active

Version

2.0

Component

Code

Created by

๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @slashrsm
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm
  • Status changed to Needs review 10 months ago
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

    Still work in progress, but the idexing side of things work.

  • Pipeline finished with Failed
    10 months ago
    Total: 385s
    #276623
  • Status changed to Needs work 10 months ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Thanks Janez. Looks like a great start. Besides the linting errors, I can see there are still a lot of hard coded values, and we're missing tests. I know it's a complex area, but it would be good to have some basic docs on setup and links for further info.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia acbramley

    Quick n dirty review for now as this is obviously still WIP - great to see movement in this area though!

  • Pipeline finished with Failed
    9 months ago
    Total: 352s
    #283755
  • Pipeline finished with Failed
    9 months ago
    Total: 354s
    #283765
  • Pipeline finished with Failed
    9 months ago
    Total: 223s
    #283871
  • Pipeline finished with Success
    9 months ago
    Total: 214s
    #283874
  • Pipeline finished with Success
    9 months ago
    Total: 413s
    #283881
  • Pipeline finished with Canceled
    9 months ago
    Total: 93s
    #283885
  • Pipeline finished with Failed
    9 months ago
    Total: 1479s
    #283886
  • Pipeline finished with Success
    9 months ago
    Total: 263s
    #283896
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm
  • Pipeline finished with Success
    9 months ago
    Total: 260s
    #284387
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

    slashrsm โ†’ changed the visibility of the branch query_side to hidden.

  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

    After watching Driesnote and looking into AI module โ†’ a bit I realized that we are basically re-implementing their provider plugins here. In order to avoid that I decided to depend on the AI module for providers. Updated MR assumes/uses โœจ Provide embeddings vector size Active , which add vector size function that we rely on.

  • Pipeline finished with Failed
    9 months ago
    Total: 225s
    #296983
  • Pipeline finished with Failed
    9 months ago
    Total: 222s
    #297007
  • ๐Ÿ‡ฆ๐Ÿ‡นAustria maximilianmikus

    I was looking into adding OpenSearch as a vector database provider and I found this issue by chance. I was wondering if it wouldn't be better to put this functionality in its own provider module? I started a project just for that before I found this issue by chance.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States damienmckenna NH, USA

    FYI the separate provider module has been deprecated in favor of this issue, though the current MR doesn't apply against the 3.x branch.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    A recommended approach for vector indexing is an ingest pipeline. I wonder if this issue could be expanded to include support for that?

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Started work on a more integrated approach. At this stage all the MR does is set index.knn = TRUE when creating the index.

    In order to have knn enabled on an index, we need to set that option when creating the index. We can change it after.

    This meant we needed to refactor the addIndex() method to not create then update settings, but to pass the settings at creation time. This refactoring could potentially be split out into a separate issue.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Ran into a bit of an issue with the pipelines. In order to have Opensearch generate the text embeddings, you need to specify text field to embedding field mappings when creating the pipeline. I don't think it would be easy to dynamically create a pipeline like this with search api.

    I'm going to check out the https://www.drupal.org/project/ai_vdb_provider_opensearch โ†’ module to see if the built-in AI Search would work.

Production build 0.71.5 2024