Back to Blog
Tutorial10 min read2026-07-01

How to Deploy a Flask-SocketIO Realtime App

Deploy a Flask-SocketIO realtime application to production: choosing the right async worker (eventlet/gevent), enabling WebSocket transport, scaling with a Redis message queue, and sticky-session considerations.

Ajay Kumar
Ajay Kumar
Founder & DevOps, PandaStack

Flask-SocketIO adds real-time, bidirectional communication to Flask apps, chat, live dashboards, notifications, collaborative editing. Deploying it is meaningfully different from deploying a plain Flask app because WebSockets are long-lived connections that the standard synchronous WSGI server can't handle well, and scaling across multiple instances requires a message broker. This guide covers getting the production stack right.

Why the default server won't do

Flask's development server and a plain synchronous Gunicorn worker handle one request per worker thread and don't support WebSockets properly. SocketIO needs an async server that can hold thousands of open connections. You have two mature choices:

  • eventlet: green-thread concurrency, the most common Flask-SocketIO pairing.
  • gevent (with gevent-websocket): similar model, also well supported.

Pick one and use the matching Gunicorn worker class. Mixing them or using a sync worker is the number-one cause of broken WebSocket deployments.

The app

from flask import Flask
from flask_socketio import SocketIO, emit
import os

app = Flask(__name__)
socketio = SocketIO(app, cors_allowed_origins="*",
                    message_queue=os.getenv("REDIS_URL"))

@socketio.on("message")
def handle_message(data):
    emit("response", {"echo": data}, broadcast=True)

if __name__ == "__main__":
    socketio.run(app, host="0.0.0.0", port=int(os.getenv("PORT", 5000)))

The message_queue parameter is the key to scaling, more on that below.

Running it in production

Use Gunicorn with an eventlet worker. Critically, with eventlet you run a single worker process and let green threads handle concurrency, not multiple workers:

gunicorn --worker-class eventlet -w 1 --bind 0.0.0.0:5000 app:app

Why -w 1? Because without a shared message queue, multiple workers can't see each other's connected clients, a broadcast from one worker won't reach clients on another. The single-worker model sidesteps this for small apps. To scale beyond one worker or one instance, you need the Redis message queue.

Scaling across instances with Redis

When you run multiple replicas, a client connected to replica A and a client connected to replica B can't exchange messages unless the replicas share state. Flask-SocketIO solves this with a Redis (or other) message queue that acts as a backplane: every replica publishes and subscribes to events through Redis, so a broadcast reaches all connected clients regardless of which replica they're on.

Set message_queue=os.getenv("REDIS_URL") as shown, and now horizontal scaling works correctly. This is the same pattern that lets a background process emit events to connected clients without holding a socket itself.

Transport and sticky sessions

SocketIO clients first establish an HTTP long-polling connection, then upgrade to WebSocket. Two deployment requirements follow:

  1. 1WebSocket upgrades must be allowed by the ingress. Kong ingress supports this.
  2. 2Sticky sessions help when not using only WebSockets. During the polling-to-WebSocket handshake, requests from the same client should reach the same instance. With a Redis backplane and WebSocket-only transport this matters less, but configuring session affinity avoids handshake failures when polling is in play.

Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["gunicorn", "--worker-class", "eventlet", "-w", "1", \
     "--bind", "0.0.0.0:5000", "app:app"]

Ensure eventlet (or gevent + gevent-websocket) is in requirements.txt.

Deploying on PandaStack

  1. 1Provision a managed Redis instance for the message queue. Reference it via REDIS_URL.
  2. 2Connect your repo as a container app. Build runs in an ephemeral Job pod with rootless BuildKit and deploys via Helm behind Kong ingress (which handles WebSocket upgrades).
  3. 3Set REDIS_URL so the SocketIO backplane is active, enabling correct behavior across replicas.
  4. 4Tail live logs to confirm the eventlet worker started and clients are connecting and upgrading to WebSocket.
RequirementSetting
Async workereventlet or gevent
Gunicorn workers-w 1 (scale via replicas + Redis)
Bind0.0.0.0, $PORT
Backplanemessage_queue=REDIS_URL
IngressWebSocket upgrades allowed

Scale-to-zero caution

Long-lived WebSocket connections and scale-to-zero don't mix: when the app scales to zero, all open connections drop and clients must reconnect on wake. For a realtime app with persistently connected users, run a warm instance on a paid tier. The free tier is fine for a low-traffic or intermittently used realtime feature where reconnects are acceptable.

Verifying

Open the app in two browser tabs, send a message from one, and confirm it broadcasts to the other. In dev tools, check the network tab shows an open WebSocket (status 101 switching protocols). If you scaled to multiple replicas, confirm cross-replica broadcast works, that proves the Redis backplane is wired correctly.

References

  • Flask-SocketIO documentation: https://flask-socketio.readthedocs.io/
  • Flask-SocketIO deployment guide: https://flask-socketio.readthedocs.io/en/latest/deployment.html
  • eventlet: https://eventlet.readthedocs.io/
  • Gunicorn worker classes: https://docs.gunicorn.org/en/stable/settings.html#worker-class
  • WebSocket protocol (RFC 6455): https://datatracker.ietf.org/doc/html/rfc6455

Flask-SocketIO in production comes down to the right async worker and a Redis backplane for scaling, get those right and realtime works cleanly. Deploy your app with managed Redis on PandaStack's free tier: https://dashboard.pandastack.io

Ready to deploy?

Start free on PandaStack.

Start free on PandaStack

More in Tutorial

Browse all Tutorial articles →

See also