Real-time Feedback in Chatbot: showing streaming answer instead of full answer at once.

Created on 9 October 2024, 4 months ago

Problem/Motivation

In the AI Assistant chatbot, I was aiming to display streaming responses as they are generated, instead of presenting the entire answer at once.

During a Slack discussion, it was highlighted that streaming answers is already supported as long as the web server configuration allows it. However, most web servers are set to buffer responses for performance reasons, which prevents real-time output.

Proposed resolution

Solution:

For nginx with php-fpm, ensure that the following setting is configured to disable buffering:

fastcgi_buffering off;

For Apache with php-fpm, you might need to adjust the following:

ProxySet enablereuse=on flushpackets=on;

Here’s my current .ddev/nginx-site.conf setup for reference (do not forget to delete the line #ddev-generated at the top of the file):

server {
    listen 80 default_server;
    listen 443 ssl default_server;

    root /var/www/html/web;

    ssl_certificate /etc/ssl/certs/master.crt;
    ssl_certificate_key /etc/ssl/certs/master.key;

    include /etc/nginx/monitoring.conf;

    index index.php index.htm index.html;

    # Disable sendfile as per https://docs.vagrantup.com/v2/synced-folders/virtualbox.html
    sendfile off;
    error_log /dev/stdout info;
    access_log /var/log/nginx/access.log;
    <strong>fastcgi_buffering off; # Disable FastCGI buffering</strong>

    # Other configurations...
}

Remaining tasks

See if the attached README file can be helpful.

Thanks to Marcus Johansson and the Drupal Slack community for sharing these insights.

πŸ’¬ Support request
Status

Active

Version

1.0

Component

AI Assistants API

Created by

πŸ‡°πŸ‡¬Kyrgyzstan dan_metille

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @dan_metille
  • πŸ‡¨πŸ‡¦Canada mandclu

    Not sure if it's possible, but it would be ideal if there way some way to set this only for the calls that are needed for the chatbot.

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    I think we should split this up in setting it up in DDEV and setting it up in production. The markdown or the DDEV changes needed for streaming when you are developing we can push here: https://project.pages.drupalcode.org/ai/developers/ddev/

    In production what @mandclu write is important - you don't want to turn off webserver buffering everywhere since it affects performance, specifically CPU usage and network congestion. It can also affect error logging in apache and nginx.

    The biggest problem is that nginx and apache has to decide in the request phase if it wants to use or not use buffering, so we won't be able to set that based on some response header or other PHP application rule.

    You could of course base this on something like a query string and a request header, but that would be easy to spoof and DDOS.

    I'll research how Big Pipe does this, because it should use similar flush or ob_flush methods to send partial buffers to the webserver. If they have it working even with webserver is buffering, we should just copy the solution they have.

    Another option would be that all streamed responses forwards the requests to a specific known endpoint with a session that could be setup to not buffer in nginx and apache, but that would need a lot of rewriting in the AI module. That would also be pretty complex nginx or apache setups, but I think that can be filed under "don't implement it, if you don't know how to set it up".

Production build 0.71.5 2024