Normalize Structured responses

Created on 10 December 2024, 2 months ago

Problem/Motivation

We have an initial version of function calling, we should perhaps also build structured responses while we are at it since they share a lot of features - a function call in the end is just another way of getting JSON back.

For more reading:
https://ollama.com/blog/structured-outputs
https://platform.openai.com/docs/guides/structured-outputs
https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrai...
https://docs.fireworks.ai/structured-responses/structured-response-forma...

Proposed resolution

Add a normalization layer for this.
Add a validation layer for this.
See how much it can work as function calling

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.0

Component

AI Core module

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @marcus_johansson
  • 🇬🇧United Kingdom seogow

    When we are at it, this is worth including: https://github.com/cognesy/instructor-php

  • 🇧🇪Belgium lammensj

    For the ECA agent, I used the Typed Data API and Schemata for generating a JSON Schema that the AI should follow. Once config validation is in core and there's support for Schema-generation, this becomes obsolete.
    For non-entity data structures, something like this approach could be beneficial, no?

    JSON Schema generation: https://git.drupalcode.org/project/ai_eca/-/merge_requests/1/diffs#288ae...

  • 🇬🇧United Kingdom yautja_cetanu

    Given how much energy there is behind this could you write in the issue what schema you're aiming for?

    It woild be nice to see how it differs from the Mcp tool schema or pattern and why?

  • 🇧🇪Belgium lammensj

    @yautja_cetanu, your question regarding the desired schema, that's one for @marcus_johansson, right?

    In my opinion, and by my (yet) limited knowledge of the MCP module, I believe a schema needs to be created either way. Whether it's developed for the "separate" implementation of function calling in the AI module, or for MCP, an LLM still needs something that it should adhere to.

    To make it more tangible; I'm going to experiment with the Typed Data model I created for the ECA Agent and see how it fits in that MCP plugin type. I believe I can re-use A LOT but just need to split up some things. Like create a separate "tool" for requesting all models, creating a model and editing one. But in the end, the LLM should still generate JSON based on the Schema that I created for the "original" Agent.

    I hope this makes sense :)

  • 🇬🇧United Kingdom yautja_cetanu

    Follow up this with functional calling. https://platform.openai.com/docs/guides/function-calling

  • 🇬🇧United Kingdom yautja_cetanu

    Note: If we try and align with MCP below is the schema MCP uses for tools. (This is probably for a new issue)

    • I am not suggesting we use ALL of MCP for tools. Just the schema for how an LLM picks between tools.
    • Probably not ALL tools will be ECA Actions, but all ECA Actions can be tools. (For example "Handover between agents tools"). We would maybe make a parent of ECA Actions called AI Actions?)
    • I think it might be good to put this as a layer between LLMs and maybe ECA Actions.
    • We need to think about how we allow a site owner to insert their own config into the action. So there is a provided "Description" by the module but a site owner can write their own copy and add it to the description for their own prompt engineering
            "Tool": {
                "description": "Definition for a tool the client can call.",
                "properties": {
                    "description": {
                        "description": "A human-readable description of the tool.",
                        "type": "string"
                    },
                    "inputSchema": {
                        "description": "A JSON Schema object defining the expected parameters for the tool.",
                        "properties": {
                            "properties": {
                                "additionalProperties": {
                                    "additionalProperties": true,
                                    "properties": {},
                                    "type": "object"
                                },
                                "type": "object"
                            },
                            "required": {
                                "items": {
                                    "type": "string"
                                },
                                "type": "array"
                            },
                            "type": {
                                "const": "object",
                                "type": "string"
                            }
                        },
                        "required": [
                            "type"
                        ],
                        "type": "object"
                    },
                    "name": {
                        "description": "The name of the tool.",
                        "type": "string"
                    }
                },
                "required": [
                    "inputSchema",
                    "name"
                ],
                "type": "object"
            },
    
  • 🇩🇪Germany marcus_johansson

    Posted a video here of a suggested solution: https://www.youtube.com/watch?v=fyCd2fwR4RQ&ab_channel=DrupalAIVideos

    I don't know if we need to do more than that?

  • 🇩🇪Germany marcus_johansson

    Ollama, OpenAI and FireworksAI is prepared to have it released as well.

  • 🇳🇱Netherlands jurriaanroelofs

    I would also include Gemini's structured output docs as I think it is a bit different but Gemini 2.0 is an incredible model for many use cases:
    https://ai.google.dev/gemini-api/docs/structured-output

    Also in some applications I implement structured output but on top of that I implement my own "dirty json" libraries and/or custom code, so that I can still experiment with faster models that don't supported structured output.

    For example with Gemini 2.0 I can get guaranteed structured output but with 2.0-flash-lite I cannot, but with some cleaning I get clean structured output with enough ease to say the juice is worth the squeeze. As with anything it depends on the use case.

  • 🇩🇪Germany marcus_johansson

    Thanks @jurriaanroelofs for the feedback - the idea is to have one normalization input layer and then for instance the Gemini module in this case has to translate this into how Gemini does it. That means you always write JSON Schemas as input, but can still use it when calling Gemini.

    I see how this could be some work to create a translate layer, but I think that is worth it, so you always have one way of interacting with it in Drupal. Since I co-maintain the Vertex provider I would fix that and can contribute this to the Gemini provider as well. I might actually try it, before I merge here just so I don't put my foot in my mouth by overpromising :)

    We use Dirty JSON in multiple places as well, it might even make sense to have that as a abstract normalization process for providers/models that doesn't support structured output, but for now this will focus on the ones that has it.

  • 🇳🇱Netherlands jurriaanroelofs

    Sounds good @marcus_johansson — by the way have you looked at how LiteLLM handles it? it works great, with Pydantic models supporting advanced requirements on response schemas:
    https://docs.litellm.ai/docs/completion/json_mode

    Of course we have no such thing in PHP. In JS, some AI SDK use Zod as an alternative to Pydantic but I think it is not as mature or powerful. I have no idea whether something similar exists in PHP.

  • 🇩🇪Germany marcus_johansson

    I have seen it before, but I haven't actually looked into the implementation. I will check how they do it and also how they solved Anthropic, since their their documentation is a little bit wonky on what is structured output.

    I don't think that we will be doing the validation to start with in the AI module, or do you think that would be a good idea? Basic JSON schema validation against JSON libraries, do exist that we could use (maybe even in core?). We want to keep the dependencies as light as possible, but we could use an optional dependency pattern to make sure that this option is only available if a specific library exists.

Production build 0.71.5 2024