- Issue created by @marcus_johansson
- Issue was unassigned.
- π¬π§United Kingdom andrewbelcher
So I think there are a couple things in place already that help with this?
- You can index multiple fields in the embeddings, so the author etc can already be in there. However, at the moment it'll just blindly do the value, without any context of the label. That might be a worthwhile option (i.e. prefix with label). I think ensuring chunking doesn't split up small fields would also be wanted.
- You can index the rendered entity in the embeddings, so you can set up an "embedding" display mode, then use entity view displays to set up the exact rendering you want, including/excluding labels as appropriate.
- π©πͺGermany marcus_johansson
Ah, #1 will work in all cases I can think off if we can get the field label in there and newlines in there. Newlines shouldn't affect embeddings performance in modern models, but having that ouput returned with newlines, helps the LLM to understand what is metadata better.
We should still figure out some way where its possible to split the chunks per some wanted custom attribute though (pages, anchor link, timecode depending on content) to have context-aware chunking. Also managing chunk size and overlap size would still be great to do.