Flask-SocketIO adds real-time, bidirectional communication to Flask apps, chat, live dashboards, notifications, collaborative editing. Deploying it is meaningfully different from deploying a plain Flask app because WebSockets are long-lived connections that the standard synchronous WSGI server can't handle well, and scaling across multiple instances requires a message broker. This guide covers getting the production stack right.

Why the default server won't do

Flask's development server and a plain synchronous Gunicorn worker handle one request per worker thread and don't support WebSockets properly. SocketIO needs an async server that can hold thousands of open connections. You have two mature choices:

eventlet: green-thread concurrency, the most common Flask-SocketIO pairing.
gevent (with gevent-websocket): similar model, also well supported.

Pick one and use the matching Gunicorn worker class. Mixing them or using a sync worker is the number-one cause of broken WebSocket deployments.

The app

from flask import Flask
from flask_socketio import SocketIO, emit
import os

app = Flask(__name__)
socketio = SocketIO(app, cors_allowed_origins="*",
                    message_queue=os.getenv("REDIS_URL"))

@socketio.on("message")
def handle_message(data):
    emit("response", {"echo": data}, broadcast=True)

if __name__ == "__main__":
    socketio.run(app, host="0.0.0.0", port=int(os.getenv("PORT", 5000)))

The message_queue parameter is the key to scaling, more on that below.

Running it in production

Use Gunicorn with an eventlet worker. Critically, with eventlet you run a single worker process and let green threads handle concurrency, not multiple workers:

gunicorn --worker-class eventlet -w 1 --bind 0.0.0.0:5000 app:app

Why -w 1? Because without a shared message queue, multiple workers can't see each other's connected clients, a broadcast from one worker won't reach clients on another. The single-worker model sidesteps this for small apps. To scale beyond one worker or one instance, you need the Redis message queue.

Scaling across instances with Redis

When you run multiple replicas, a client connected to replica A and a client connected to replica B can't exchange messages unless the replicas share state. Flask-SocketIO solves this with a Redis (or other) message queue that acts as a backplane: every replica publishes and subscribes to events through Redis, so a broadcast reaches all connected clients regardless of which replica they're on.

Set message_queue=os.getenv("REDIS_URL") as shown, and now horizontal scaling works correctly. This is the same pattern that lets a background process emit events to connected clients without holding a socket itself.

Transport and sticky sessions

SocketIO clients first establish an HTTP long-polling connection, then upgrade to WebSocket. Two deployment requirements follow:

1WebSocket upgrades must be allowed by the ingress. Kong ingress supports this.
2Sticky sessions help when not using only WebSockets. During the polling-to-WebSocket handshake, requests from the same client should reach the same instance. With a Redis backplane and WebSocket-only transport this matters less, but configuring session affinity avoids handshake failures when polling is in play.

Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["gunicorn", "--worker-class", "eventlet", "-w", "1", \
     "--bind", "0.0.0.0:5000", "app:app"]

Ensure eventlet (or gevent + gevent-websocket) is in requirements.txt.

Deploying on PandaStack

1Provision a managed Redis instance for the message queue. Reference it via REDIS_URL.
2Connect your repo as a container app. Build runs in an ephemeral Job pod with rootless BuildKit and deploys via Helm behind Kong ingress (which handles WebSocket upgrades).
3Set REDIS_URL so the SocketIO backplane is active, enabling correct behavior across replicas.
4Tail live logs to confirm the eventlet worker started and clients are connecting and upgrading to WebSocket.

Requirement	Setting
Async worker	eventlet or gevent
Gunicorn workers	`-w 1` (scale via replicas + Redis)
Bind	`0.0.0.0`, `$PORT`
Backplane	`message_queue=REDIS_URL`
Ingress	WebSocket upgrades allowed

Scale-to-zero caution

Long-lived WebSocket connections and scale-to-zero don't mix: when the app scales to zero, all open connections drop and clients must reconnect on wake. For a realtime app with persistently connected users, run a warm instance on a paid tier. The free tier is fine for a low-traffic or intermittently used realtime feature where reconnects are acceptable.

Verifying

Open the app in two browser tabs, send a message from one, and confirm it broadcasts to the other. In dev tools, check the network tab shows an open WebSocket (status 101 switching protocols). If you scaled to multiple replicas, confirm cross-replica broadcast works, that proves the Redis backplane is wired correctly.

References

Flask-SocketIO documentation: https://flask-socketio.readthedocs.io/
Flask-SocketIO deployment guide: https://flask-socketio.readthedocs.io/en/latest/deployment.html
eventlet: https://eventlet.readthedocs.io/
Gunicorn worker classes: https://docs.gunicorn.org/en/stable/settings.html#worker-class
WebSocket protocol (RFC 6455): https://datatracker.ietf.org/doc/html/rfc6455

Flask-SocketIO in production comes down to the right async worker and a Redis backplane for scaling, get those right and realtime works cleanly. Deploy your app with managed Redis on PandaStack's free tier: https://dashboard.pandastack.io

How to Deploy a Flask-SocketIO Realtime App

Why the default server won't do

The app

Running it in production

Scaling across instances with Redis

Transport and sticky sessions

Dockerfile

Deploying on PandaStack

Scale-to-zero caution

Verifying

References

Ready to deploy?

More in Tutorial

How to Deploy a Phoenix (Elixir) App to the Cloud

How to Deploy a Monorepo with Multiple Services

How to Deploy a Python RQ Background Worker

See also