Add support for neural search (text embeddings)

Created on 7 September 2024, 11 months ago

Problem/Motivation

OpenSearch supports neural search using text embeddings: https://opensearch.org/docs/latest/search-plugins/neural-search/

The purpose of this issue is to add neural search capabilities to this module.

Proposed resolution

OpenSearch can generate text embeddings for you, but we decided to do it on the Drupal side instead. The main reason is to allow us to use AI models that are not supported by OpenSearch.

Remaining tasks

- Create neural backend plugin type
- Implement ollama backend plugin
- Implement OpenAI backend plugin
- Generate text embeddings during indexing and add the vector to the indexed data
- Create views filter for neural search (which generates the vector for the search query and queries OpenSearch based on that)
- Create configuration UIs

โœจ Feature request
Status

Active

Version

2.0

Component

Code

Created by

๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @slashrsm
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm
  • Status changed to Needs review 11 months ago
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

    Still work in progress, but the idexing side of things work.

  • Merge request !64#3472769: Add support for neural search. โ†’ (Closed) created by slashrsm
  • Pipeline finished with Failed
    11 months ago
    Total: 385s
    #276623
  • Status changed to Needs work 11 months ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Thanks Janez. Looks like a great start. Besides the linting errors, I can see there are still a lot of hard coded values, and we're missing tests. I know it's a complex area, but it would be good to have some basic docs on setup and links for further info.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia acbramley

    Quick n dirty review for now as this is obviously still WIP - great to see movement in this area though!

  • Pipeline finished with Failed
    11 months ago
    Total: 352s
    #283755
  • Pipeline finished with Failed
    11 months ago
    Total: 354s
    #283765
  • Pipeline finished with Failed
    11 months ago
    Total: 223s
    #283871
  • Pipeline finished with Success
    11 months ago
    Total: 214s
    #283874
  • Pipeline finished with Success
    11 months ago
    Total: 413s
    #283881
  • Pipeline finished with Canceled
    11 months ago
    Total: 93s
    #283885
  • Pipeline finished with Failed
    11 months ago
    Total: 1479s
    #283886
  • Pipeline finished with Success
    11 months ago
    Total: 263s
    #283896
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm
  • Pipeline finished with Success
    11 months ago
    Total: 260s
    #284387
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

    slashrsm โ†’ changed the visibility of the branch query_side to hidden.

  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia slashrsm

    After watching Driesnote and looking into AI module โ†’ a bit I realized that we are basically re-implementing their provider plugins here. In order to avoid that I decided to depend on the AI module for providers. Updated MR assumes/uses โœจ Provide embeddings vector size Active , which add vector size function that we rely on.

  • Pipeline finished with Failed
    11 months ago
    Total: 225s
    #296983
  • Pipeline finished with Failed
    11 months ago
    Total: 222s
    #297007
  • ๐Ÿ‡ฆ๐Ÿ‡นAustria maximilianmikus

    I was looking into adding OpenSearch as a vector database provider and I found this issue by chance. I was wondering if it wouldn't be better to put this functionality in its own provider module? I started a project just for that before I found this issue by chance.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States damienmckenna NH, USA

    FYI the separate provider module has been deprecated in favor of this issue, though the current MR doesn't apply against the 3.x branch.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    A recommended approach for vector indexing is an ingest pipeline. I wonder if this issue could be expanded to include support for that?

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Started work on a more integrated approach. At this stage all the MR does is set index.knn = TRUE when creating the index.

    In order to have knn enabled on an index, we need to set that option when creating the index. We can change it after.

    This meant we needed to refactor the addIndex() method to not create then update settings, but to pass the settings at creation time. This refactoring could potentially be split out into a separate issue.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Ran into a bit of an issue with the pipelines. In order to have Opensearch generate the text embeddings, you need to specify text field to embedding field mappings when creating the pipeline. I don't think it would be easy to dynamically create a pipeline like this with search api.

    I'm going to check out the https://www.drupal.org/project/ai_vdb_provider_opensearch โ†’ module to see if the built-in AI Search would work.

  • First commit to issue fork.
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia mstrelan

    Updated the MR so choosing the Text Embedding (Vector) data type creates the field as a text field. Then the field mapper is decorated to create a paired knn_vector field. That will let us dynamically create the ingest pipeline.

  • First commit to issue fork.
  • Pipeline finished with Failed
    about 1 month ago
    Total: 455s
    #541547
  • Pipeline finished with Failed
    about 1 month ago
    Total: 148s
    #541883
  • Pipeline finished with Failed
    about 1 month ago
    Total: 311s
    #541910
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Chatting with @mohit_aghera in slack and we have a new approach based on a semantic field type in OpenSearch 3.1. This greatly simplifies the amount of work needed to support neural search.

    There is a bug with remote text embedding models, which was recently fixed and merged but not yet in a release. https://github.com/opensearch-project/neural-search/issues/1426

  • Pipeline finished with Failed
    about 1 month ago
    Total: 264s
    #543841
  • Pipeline finished with Failed
    about 1 month ago
    Total: 275s
    #543840
  • Pipeline finished with Success
    about 1 month ago
    Total: 220s
    #543846
  • Pipeline finished with Success
    about 1 month ago
    Total: 221s
    #543845
  • Pipeline finished with Success
    about 1 month ago
    Total: 235s
    #543851
  • Pipeline finished with Failed
    about 1 month ago
    Total: 264s
    #544909
  • Pipeline finished with Success
    about 1 month ago
    Total: 331s
    #544976
  • Pipeline finished with Failed
    about 1 month ago
    #548042
  • Pipeline finished with Failed
    about 1 month ago
    #549035
  • Pipeline finished with Failed
    about 1 month ago
    #549062
  • Pipeline finished with Success
    about 1 month ago
    #549065
  • Pipeline finished with Success
    29 days ago
    Total: 339s
    #551342
  • Pipeline finished with Success
    29 days ago
    Total: 256s
    #551374
  • Pipeline finished with Success
    23 days ago
    Total: 218s
    #556079
  • Pipeline finished with Failed
    18 days ago
    #559621
  • Pipeline finished with Failed
    16 days ago
    Total: 397s
    #562091
  • Pipeline finished with Failed
    16 days ago
    Total: 250s
    #562126
  • Pipeline finished with Failed
    16 days ago
    Total: 276s
    #562130
  • Pipeline finished with Success
    16 days ago
    Total: 397s
    #562132
  • Pipeline finished with Success
    15 days ago
    Total: 434s
    #562377
  • Pipeline finished with Success
    15 days ago
    Total: 220s
    #562405
  • Pipeline finished with Success
    12 days ago
    Total: 220s
    #565188
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    We have moved this to it's own project https://www.drupal.org/project/search_api_opensearch_semantic โ†’ with the latest from MR !107.

    Please continue any work over there.

    Marking this issue as Fixed to provide contribution credits.

  • Status changed to Fixed 5 days ago
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia kim.pepper ๐Ÿ„โ€โ™‚๏ธ๐Ÿ‡ฆ๐Ÿ‡บSydney, Australia

    Created ๐Ÿ“Œ Add a BeforeIndexCreateEvent Active for the new event.

Production build 0.71.5 2024