Allow the control of an AI Chatbot via voice

Created on 9 October 2024, 4 months ago

Problem/Motivation

Some people will find talking easier to get their ideas out than typing and its also kinda cool and sci to be able to control a team of AI agents with your voice.

OpenAI have recently released their RealTime Voice API
https://platform.openai.com/docs/guides/realtime/concepts

Steps to reproduce

Proposed resolution

  • Whilst there are multiple voice to text providers, this is a chat LLM with voice in it. (So we probably need the audio flat for the chat LLM). It's likely we also want a new "Service" that is Realtime Voice LLM model. In the list of defaults.
  • We need some library so that the Drupal site can use your microphone so control the chatbot
  • Include in the OpenAI provider module in core access to the RealTime API and make it make sense in the settings somewhere
  • Have a setting in the chatbot or assistants api for "Includes voice" (Might need to think about where is best to place this. Maybe we need to make it so it uses a different model if you're using voice to if you're not
  • Decide on the UI of the chatbot. When you click voice, how do you do it? What does the screen look like when you use the voice (The ChatGBT version has a swirling image that moves when it talks so you don't see the text until you exist out. We probably want to store the text on the assistant though as if they typed it and use the session storage mechanism that is set in the assistant api.
  • Might need to think about how it will work at an abstraction level, what if the model doesn't have voice but does the old school way of using 3 seperate models, voice to text, llm, text to voice. I think lets not worry about this for now.
  • This is such an experimental feature that it could be its own module. But we will need the Realtime API stuff to go into the openai provider.

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Active

Version

1.0

Component

AI Assistants API

Created by

🇬🇧United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @yautja_cetanu
  • 🇬🇧United Kingdom MrDaleSmith

    I've had a go at getting up to speed on this, but I've hit the first stumbling block: the majority of the providers (OpenAPI and non-OpenAI) are using https://github.com/openai-php/client as a PHP client to send and receive requests from the LLM. The Realtime API is connected to through the /realtime endpoint, and this endpoint is not supported by the PHP client.

    There is no issue raised on the client for adding support for realtime, so it is unlikely work on this has started yet - this is a new experimental feature in beta so it is unlikely there are alternative libraries that can be used. It looks like we will either need to roll our own solution, or face an extended wait for one to appear.

    The other issue is that the realtime api has this warning in its documentation:

    Real time audio is heavily affected by network conditions, and reliably delivering real time audio to a server (e.g. from a mobile client to a backend) at scale is a challenge when network conditions are unpredictable.

    For this reason, if you are building client-side or telephony applications where you do not have control over the reliability of the network, for production use cases we recommend that you evaluate a purpose-built third-party solution, such as our partners' integrations listed below.

    As a Drupal module, we will have zero control over the network conditions it is used under, this may cause us issues. Equally it is unlikely we can request all users register with an additional third-party service, and the audio traffic maybe be minimal given that we will be using it for sporadic interactions rather than protracted conversations. It may be something to bear in mind, however.

  • 🇬🇧United Kingdom yautja_cetanu

    Ok at the very least then this makes me think if we do this we should do this as a module outside of the AI module

Production build 0.71.5 2024