- Issue created by @aolivera
- Merge request !570Draft: Resolve #3520762 "Add optional datastore" → (Open) created by Unnamed author
- Merge request !571add an optional field for models to add a datastore → (Open) created by Unnamed author
- 🇬🇧United Kingdom scott_euser
Is this a new feature some providers offer? Can you link to docs of such a provider to help provide context?
At the moment AI Search submodule handles the integration with Vector Databases to to RAG type stuff.
- 🇺🇸United States aolivera
Some LLM providers—most notably Google Vertex AI—offer built-in support for grounding prompts against external datastores (e.g. BigQuery tables, Cloud Storage buckets, Firestore/Datastore collections, or other HTTP-accessible endpoints). This feature is effectively a provider-managed form of Retrieval-Augmented Generation (RAG), where you tell the API “look in this datastore for relevant context before generating.”
https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gr...
https://console.cloud.google.com/gen-app-builder/data-stores/createHow It Differs from AI Search
The AI Search sub-module in Drupal AI hooks into Search API and external vector databases (Solr, PgVector, Pinecone, etc.) to perform RAG at the Drupal layer.Grounding via Vertex AI’s datastore happens entirely on the provider side: you simply supply the datastore identifier in your API call, and Vertex handles retrieval internally.
Example Use Case
You’re using the CKEditor AI plugin to draft content, but you want suggestions grounded in your own product catalog stored in Firestore.In the model config you enter your Firestore path in “Datastore.”
Every time the CKEditor AI button calls the provider, it transparently includes your datastore setting—so suggestions pull live data without custom middleware.
By adding this as a first-class field in Drupal AI’s configuration, site builders get direct, provider-native grounding alongside the existing AI Search RAG capability—letting them choose whichever approach best fits their needs.
- 🇬🇧United Kingdom scott_euser
Pretty cool! I think we need at least to say that a provider supports external data source + need to consider how other providers handle this (if any) so that we build it in a way that makes it usable by other providers if/when they start to offer it.
Then needs a related issue in https://www.drupal.org/project/ai_provider_google_vertex → to opt-in to the extra configuration option.
I can see the argument for keeping it in 'Chat'.
This is going to need opinions from other maintainers though, particularly Marcus I think.
- 🇺🇸United States aolivera
Thanks for the feedback Scott, i have reached out to Marcus to get his thoughts as well.
- 🇩🇪Germany marcus_johansson
That's a really cool feature. I would suggest as an initial to-do is to research if there are other providers supporting this and how the use case might be different, so it is prepared for the future.
If its not supported in other providers, I think it might be confusing to show that option in general using the base class. Ollama and Huggingface for instance uses that base class and will most likely not support datastores at the moment. Azure might?
Since the models base class is/should be extendable, I think a way forward for 1.x branch is that we support this specifically for Google Vertex and then in 2.x start supporting it widely, if the providers start supporting it widely.
- 🇺🇸United States aolivera
Hey Marcus,
Thanks for the reply, after researching the major AI providers, it looks like only Google Vertex AI and Azure OpenAI currently offer native grounding via a datastore parameter. Ollama, Hugging Face, and the standard OpenAI API don’t support this out of the box.
Based on your suggestion, I scoped the new “Datastore” field so it only appears in the google vertex provider UI. Drupal’s existing form handling and config storage takes care of saving and loading the value—no changes were needed to the abstract base class.
- 🇬🇧United Kingdom MrDaleSmith
Sounds like this may be a candidate for closing as works as designed?