The entity should be inserted as one string and not several entries

Issue created by @scott_euser
Comment over 1 year ago →
🇺🇸United States kevinquillen
I think because I couldn't determine the best path for supporting several services. Pinecone uses namespaces to split data into smaller buckets, for example. I also wound up thinking maybe you don't need that level of splitting, and just stored one vector.
Comment over 1 year ago →
🇬🇧United Kingdom scott_euser
I see, yeah I guess also costs will be lower if 1 rendered node = 1 embedding. Anyways things should become more clear as more vector client plugins are built I expect.
Comment over 1 year ago →
🇬🇧United Kingdom scott_euser
Updated task description - a colleague of mine may start working on this (I can do initial review), so I have tried to provide clearer steps how we can approach this.

I am not sure if we should delete search indexed content and re-index as part of a batch update hook, what do you think @kevinquillen? I believe the functionality will continue to work as the SearchForm does not actually care about the field level responses and uses the entity type and entity ID only anyways, so I am suggesting in the task description we have an empty update hook outputting a warning message with advice on re-indexing.
Comment over 1 year ago →
🇬🇧United Kingdom ben.bastow
Going to have a look and start working on the issue
Comment over 1 year ago →
🇬🇧United Kingdom ben.bastow
I'm going to start working on this issue
Comment about 1 year ago →
🇧🇪Belgium mpp
"I also wound up thinking maybe you don't need that level of splitting, and just stored one vector."

While having a single embedding for one document has some benefits (e.g. cheaper, faster), it comes with some challenges:
- document size may be too large to create a single embedding, hence chunking is needed
- you could loose meaning/semantics

Depending on the context & the type of content, you may want to have a different embedding strategy. For instance paragraphs or sentence embeddings may be useful.
Comment about 1 year ago →
🇬🇧United Kingdom scott_euser
Also the latest eg openai releases contain support for higher (and lower) numbers of dimensions per embedding which can help handle different use cases ✨ Support new embeddings models and dimensions (3-small, 3-large) Needs review - work in progress.

It's probably never going to be possible to have a one-size-fits-all approach here. A sensible default with options for developers to extend/do something custom/use external module for embedding (like search_api_ai) all help.

Perhaps we should either mark as outdated, or change the issue summary to reflect a different default if a consensus forms around something other than the current state.

The entity should be inserted as one string and not several entries

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Comments & Activities