Create OpenAI mockup library

Created on 5 July 2024, 5 months ago
Updated 9 September 2024, 2 months ago

Problem/Motivation

We want to have functional testing on the AI and all the submodules where we actually invoke Drupal together with some of the modules and providers installed, to make sure that things works end-to-end. Historically a problem for the AI Interpolator (now automator) is that any small change/bug in a chain/blueprint/instructionset will cause the whole thing to fail. For integration tests you can mock this in Guzzle or the OpenAI client, but for functional tests a real mockup server would be wanted.

To be able to test this, we would need ways to mockup some of the API calls, where the four main used calls for now is the moderations, models, chat completion and text to image calls from OpenAI.

There is an open source product in C# (https://www.nuget.org/packages/OpenAI.Mock/1.0.4) and a paid service at wiremockapi.com, also using Prism and a Swagger file would work. However I have not been able to find a PHP solution for this or a version where you can make sure how the response looks based on the request.

This would also make it posssible to do a lot of local development without being to dependent or having to wait for external services like OpenAI API.

Any solution that could do the following however would be good:
1. Mock the moderations, chat completions and image generations api endpoints with realistic output dependent on settings (but not prompts).
2. Configure how long time it takes to respond, so you can use it locally and not have to wait and setup realistic tests with waiting time.
3. Be able to setup specific responses in YAML/JSON configs, that if a prompt or config looks a specific way, it should answer a specific way.
4. (later) Be able to do chaos monkey responses.
5. Be able to run as its own DDEV service, but also be available like a Docker image.
6. There should be a config/settings file to set an API key for the server.

The tool has been approved to be open sourced.

Note: It only needs to be in some PHP framework or even vanilla, not in Drupal.

Proposed resolution

1. First figure out if there are other tools that I missed that would solve this without writing it ourselves.
2. If not, write a mockup api that handles the three calls. See specifications what arguments and errors it should give back below.
3. Write a way where it looks through a directory for YAML files that can give specific responses on specific matched inputs. Also add the possibility to set a wait time there to mockup real wait times.
4. Setup this in a DDEV service and create a Github action that generates a Docker image around it with some webserver+PHP image.

Endpoints mocked

Moderations

Documentation: https://platform.openai.com/docs/api-reference/moderations

Takes:
* API key
* Input
* Model

Failures
* API key not given or false.
* Input not given.

Models

Documentation: https://platform.openai.com/docs/api-reference/models

Takes:
* API key

Failures
* API key not given or false.

Text To Image

Documentation: https://platform.openai.com/docs/api-reference/images/create

Takes:
* API key
* prompt
* model
* n
* quality
* response_format
* size
* style
* user

Failures
* API key not given or false.
* prompt not given
* n not within 1 to 10 if given
* response_format not one of url or b64_json if given
* size not being one of the approved sizes if given

Logic:
* Should respond to n
* Should respond to response_format

Other considerations:
* Make sure to have images hosted on the mockup system of each of the sizes, so actual images can be downloaded (png)

Chat Completions

Documentation: https://platform.openai.com/docs/api-reference/chat/create

Takes:
* API key
* messages
* model
* frequency_penalty
* max_tokens
* presence_penalty
* temperature
* top_p
* tools
* user

Failures
* API key not given or false.
* messages not given or not given in the right format

YAML Structure

This is just an example of how the YAML structure for an overwritten answer of the chat completion could look like

wait: 3000 # The wait time in milliseconds before sending the input message
endpoint: /v1/chat/completions # The endpoint to send the input message
input: # The input to match to send this output instead of the generic one
  messages:
    - role: system
      content: You are an helpful assistant
    - role: user
      content: I am looking for a restaurant
output: # The output message to send to the user
  message:
    role: assistant
    content: I know a few good restaurants. What type of food are you in the mood for?

Data model changes

Feature request
Status

Fixed

Version

1.0

Component

Miscellaneous

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @marcus_johansson
  • 🇬🇧United Kingdom yautja_cetanu

    I'm guessing we need it to work with some tokeniser too? Am I right that different models have different approaches to tokens? So if we give a realistic response for "max_tokens" to be perfect we'd need a tokeniser for each model? Should we just pick a specific tokeniser likes Chatgpt 4o's one so its roughly correct?

  • 🇩🇪Germany marcus_johansson

    This one will actually just mock OpenAI - since its the most used one and since many follows the same standard, its enough to have some testing for now. If we need to mock Huggingface, that is a huge effort, so this will be ok for now I think.

    The other option we can look into is writing json template for https://github.com/mockoon/mockoon and have it running as a Mockoon service.

  • 🇩🇪Germany marcus_johansson

    I looked into https://github.com/mockoon/mockoon and it seems to have everything we need. To not reinvent the wheel it makes sense to use this or any other generic open source mocking tool.

    Mockoon is great though because it:
    1. Is open source.
    2. Its nodejs, which I think more Drupal developers knows then Java that is based for other populat mockup frameworks (wiremock, mockserver).
    3. It had Docker images prepared which means DDEV and testing integrations are easy.
    4. It does not run on YAML configurations, but JSON, which is a little bit harder to read/write, but its the native response of most API's, so it kind of makes sense for comparisons.
    5. It can do delays of responses to mock OpenAI's latency.
    6. It can do static file responses to mock urls for text-to-image responses.
    7. If it would fail something it still has a proxy option to setup custom endpoints.
    8. It has an enterprise SaaS behind it that could sponsor with setups, even though its easy enough to setup in Gitlab.

    I would say that researching some complex test endpoint for a more complex test would make sense to see if the project holds everything it promises.

    One example would be a typical Text To Image, where you would have to do it in two steps:

    1. First you do a text-to-image call that should fail and work according to the specification above, including checking the bearer token.
    2. Then you do a secondary call to get the "generated" static image, based on the resolution.

    If this is possible to mockup, with possible error responses for unvalid api key, missing prompt etc. I would say that it is enough to use this.

  • Status changed to Fixed 3 months ago
  • 🇩🇪Germany marcus_johansson

    Fixed by @mjb3141

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024