Research best way to return constant feedback and do long running tasks in a chatbot

Created on 2 June 2025, 2 days ago

Problem/Motivation

We need to research the best way to handle long running tasks that gives feedback to the end user during that request.

The features that we MUST HAVE is:

  • It should be possible for the end user to give a instruction to a chatbot or some other frontend that might need multiple agents to work together in multiple loops to solve it.
  • It should be possible to see each steps the orchestrator does when it calls agents or tools - either from the LLMs text response or to show the tool/agent that is being called.
  • It should be possible to get back a response when its finished.
  • It should not leave any garbage because the process got interupted by the front-end. For instance create a fiedl storage, but never finish creating the whole field confgiuration.

The features that are NICE TO HAVE is:

  1. Should work also with streaming audio in and out (AKA talking to the chatbot).
  2. Should create one request per agent or tool call, so that the maximum response time is set per agent/tool and not the whole process.
  3. Handle streaming in the frontend, so it can show the textual message before it does the tool call for instance if its a return message with a tool call.
  4. Should only ever require one call at a time to the server, per agent/tool usage.

The feature to keep in the back of the mind when researching this, since its tightly coupled, is:

  • It should be possible for a tool that takes longer then a maximal request time of the server to have to be able to run in the background or be restarted on the next request, see for instance Symfony Messenger.

Concurrency/Asynchrounous is another issue we can look into, but the problem with this is that the requests might end up creating transaction anyway and wait for a transaction to finish, so it becomes synchronous in the end anyway. But if you ask to create five taxonomy terms and connect a vocabulary to a field, those things can happen at the same time in no specific order.

Note that we have a first solution issue in Add a polling mechanism to get constant feedback on status the chatbot Active , but it does require two workers per chat, so its not a very scalable solution on website that wants to expose this to the general end-user, as opposed to the admin users.

Preferably this should all run on a normal LAMP/LEMP stack.

Current Thoughts

There are three versions that are possible I can think of right of the bat:

Javascript as the orchestrator - this means that the backend only responds one loop at a time, so if you ask the the Drupal CMS Assistant to create a field on a content type, then the first loop will respond that it is going to use the field agent to do this, but it stops there and the next loop doesn't happen until the javascript responds to do the next step.

The advantages is that you get one request and request timeout per call, you get a natural stop where to show some more information and the backend already supports this. We have tracing software that can step one step at a time. It also 100% runs on a normal LAMP/LEMP stack without any modifications needed.

The problem is that if you ask it to do three things sequentially for instance "Create a vocabulary, add five terms to it and then connect it to the content type article" and you close the browser after the first request started, you end up with a vocabulary, but no terms or field. So a garbage collector or a cron job that finishes this request might be needed.

Streamed response with JSON-LD/SSE - This is already how streaming works, they you have a response that is a connection that is open that gives back a little bit of information at a time until the instructions are finished and the request is terminated. We could here add a streaming message that gives back what is happening. We need to research how you swap out messages in a stream in Deepchat for this to work (its possible).

The advantage of this is that this is a pattern the Deepchat already uses for streaming messages, so its 100% aware when a message is done, without any extra javascript code needed. Also if you can turn off buffer caching on an endpoint, you can also probably do longer timeouts.

The disadvantage of this is that we will need to be able to hook in on agent or event level, when the tools are given back and throw this in a the output buffer. Also most LEMP/LAMP stacks from the start will buffer the response for performance reasons and setting up this to only be turned off for the chatbot is no the simplest. This also means that this feature will not work if you do not have streaming on.

Websocket orchestrator - We use websockets instead using either Swoole, Ratchet or something similar in Drupal/PHP or we set a nodejs server infront of Drupal that talks to either the Assistants API or an MCP server.

The advantage of this is that the streaming audio part could be a nodejs layer, so we could add realtime audio easily. Also Deepchat has native websocket support.

The disadvantage of this is that its not longer an LAMP/LEMP server stack - even things like Swoole or Ratchet is something that some managed webserver will simply not allow.

📌 Task
Status

Active

Version

1.2

Component

AI Assistants API

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024