Add optional “Datastore” configuration field to AI provider model forms

Created on 23 April 2025, 5 days ago

Problem/Motivation

Currently the AI contrib module does not provide a way to specify an external datastore identifier when defining or editing a provider’s model. For integrations that need to ground prompts or fetch context from a vector store (Fire­store/Datastore, Pinecone, etc.), there is no built-in field to capture that datastore ID. As a result, site builders must implement hacks or custom overrides to pass a datastore reference into their provider’s request logic.

Steps to reproduce

Go to Administration → Configuration → AI → Providers → Chat Models (e.g. /admin/config/ai/providers/chat.

Edit an existing model or create a new one.

Observe that the form only includes fields for Model ID, Label, capabilities checkboxes, and token limits—no field exists for an external datastore identifier.

Proposed resolution

Add an optional datastore textfield to the shared model form base (AbstractModelFormBase) so it appears for all operation types.

Ensure the field is displayed in each provider’s configuration form (e.g. ChatModelForm, EmbeddingsModelForm).

Save the submitted value into the provider’s configuration (e.g. under models.{provider_id}.datastore in ai.settings).

Expose it via the provider API (e.g. $this->getConfiguration()['datastore']) so custom provider implementations can retrieve and use it in their request payloads.

Remaining tasks

Define the datastore form element in AbstractModelFormBase::form().

Update each form subclass (ChatModelForm, EmbeddingsModelForm, etc.) to render the field correctly.

Modify the submit handler (or entity save logic) to persist the datastore value.

User interface changes

New textfield labeled “Datastore” under the Model Data fieldset.

Description text: “Optional datastore identifier.”

Field is not required and defaults to the existing config['datastore'] value (if any).

API changes

getModelInfo() extended and now will return datastore

Data model changes

Feature request
Status

Active

Version

1.1

Component

AI Core module

Created by

🇺🇸United States aolivera

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @aolivera
  • Pipeline finished with Failed
    5 days ago
    Total: 202s
    #480361
  • Pipeline finished with Failed
    5 days ago
    Total: 430s
    #480381
  • Pipeline finished with Success
    5 days ago
    Total: 299s
    #480428
  • 🇬🇧United Kingdom scott_euser

    Is this a new feature some providers offer? Can you link to docs of such a provider to help provide context?

    At the moment AI Search submodule handles the integration with Vector Databases to to RAG type stuff.

  • 🇺🇸United States aolivera

    Some LLM providers—most notably Google Vertex AI—offer built-in support for grounding prompts against external datastores (e.g. BigQuery tables, Cloud Storage buckets, Firestore/Datastore collections, or other HTTP-accessible endpoints). This feature is effectively a provider-managed form of Retrieval-Augmented Generation (RAG), where you tell the API “look in this datastore for relevant context before generating.”

    https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gr...
    https://console.cloud.google.com/gen-app-builder/data-stores/create

    How It Differs from AI Search
    The AI Search sub-module in Drupal AI hooks into Search API and external vector databases (Solr, PgVector, Pinecone, etc.) to perform RAG at the Drupal layer.

    Grounding via Vertex AI’s datastore happens entirely on the provider side: you simply supply the datastore identifier in your API call, and Vertex handles retrieval internally.

    Example Use Case
    You’re using the CKEditor AI plugin to draft content, but you want suggestions grounded in your own product catalog stored in Firestore.

    In the model config you enter your Firestore path in “Datastore.”

    Every time the CKEditor AI button calls the provider, it transparently includes your datastore setting—so suggestions pull live data without custom middleware.

    By adding this as a first-class field in Drupal AI’s configuration, site builders get direct, provider-native grounding alongside the existing AI Search RAG capability—letting them choose whichever approach best fits their needs.

  • 🇬🇧United Kingdom scott_euser

    Pretty cool! I think we need at least to say that a provider supports external data source + need to consider how other providers handle this (if any) so that we build it in a way that makes it usable by other providers if/when they start to offer it.

    Then needs a related issue in https://www.drupal.org/project/ai_provider_google_vertex to opt-in to the extra configuration option.

    I can see the argument for keeping it in 'Chat'.

    This is going to need opinions from other maintainers though, particularly Marcus I think.

  • 🇺🇸United States aolivera

    Thanks for the feedback Scott, i have reached out to Marcus to get his thoughts as well.

  • 🇩🇪Germany marcus_johansson

    That's a really cool feature. I would suggest as an initial to-do is to research if there are other providers supporting this and how the use case might be different, so it is prepared for the future.

    If its not supported in other providers, I think it might be confusing to show that option in general using the base class. Ollama and Huggingface for instance uses that base class and will most likely not support datastores at the moment. Azure might?

    Since the models base class is/should be extendable, I think a way forward for 1.x branch is that we support this specifically for Google Vertex and then in 2.x start supporting it widely, if the providers start supporting it widely.

  • 🇺🇸United States aolivera

    Hey Marcus,

    Thanks for the reply, after researching the major AI providers, it looks like only Google Vertex AI and Azure OpenAI currently offer native grounding via a datastore parameter. Ollama, Hugging Face, and the standard OpenAI API don’t support this out of the box.

    Based on your suggestion, I scoped the new “Datastore” field so it only appears in the google vertex provider UI. Drupal’s existing form handling and config storage takes care of saving and loading the value—no changes were needed to the abstract base class.

  • 🇬🇧United Kingdom MrDaleSmith

    Sounds like this may be a candidate for closing as works as designed?

Production build 0.71.5 2024