# How to Deploy a ComfyUI Image Pipeline
ComfyUI is the most flexible node-based interface for Stable Diffusion and related image models. Most people run it locally with a GPU, but you can deploy it as a backend service that your app calls via its API. This guide covers running ComfyUI headless in production and the realities of doing so.
Set expectations: this is a heavy workload
Before anything else, be honest about the requirements. Diffusion models are large (multiple gigabytes per checkpoint) and generation is compute-intensive. ComfyUI runs *much* faster on a GPU; CPU-only generation works for tiny test images but is impractical for real output. Plan your deployment around:
- Model storage — checkpoints, LoRAs, VAEs, and ControlNet models add up to many gigabytes. They need persistent storage, not an ephemeral container filesystem.
- Compute — substantial CPU/RAM at minimum; GPU for usable speed.
- Cold loading — first generation after start loads models into memory, which is slow.
Run ComfyUI headless with its API
ComfyUI exposes an HTTP/WebSocket API. The key flag is to bind to all interfaces so the container is reachable:
python main.py --listen 0.0.0.0 --port 8188You drive it by POSTing a workflow (the JSON graph you'd build in the UI) to /prompt, then polling /history/{prompt_id} or listening on the WebSocket for completion. Export a workflow as API JSON from the UI (enable dev mode options) and treat that JSON as your pipeline definition checked into git.
import requests, json
def generate(prompt_workflow: dict):
r = requests.post("http://comfyui:8188/prompt", json={"prompt": prompt_workflow})
prompt_id = r.json()["prompt_id"]
# poll /history/{prompt_id} until images appear, then fetch via /view
return prompt_idContainerize
A Dockerfile that installs ComfyUI and its dependencies. Keep models *out* of the image — mount or download them at runtime so the image stays small and models stay shareable:
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends git \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt
RUN git clone https://github.com/comfyanonymous/ComfyUI.git
WORKDIR /opt/ComfyUI
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8188
CMD ["python", "main.py", "--listen", "0.0.0.0", "--port", "8188"]Architecture: put a thin API in front
Don't expose raw ComfyUI to the internet. Front it with a small API service that:
- Validates and templates workflows (users pass parameters, not raw graphs).
- Authenticates requests.
- Queues jobs so concurrent requests don't overwhelm a single GPU.
- Returns job IDs and serves finished images.
Client → API service (auth, queue) → ComfyUI backend → image outputThis two-service split is the production-grade shape and keeps the powerful, unauthenticated ComfyUI internals private.
Deploy on PandaStack
- 1Push your repo (the ComfyUI Dockerfile plus your API service) to GitHub.
- 2Create a container app for the API service and one for the ComfyUI backend in the [dashboard](https://dashboard.pandastack.io). They build via rootless BuildKit and deploy via Helm.
- 3Choose a compute tier sized to the workload. For CPU inference and orchestration, c1/c2 compute-optimized or m1/m2 memory-optimized tiers (up to 8 CPU / 16GB on C2-2XCompute) handle the API, queueing, and lighter models. Diffusion at scale ultimately wants GPU — size honestly for your model.
- 4Use a managed datastore (Postgres/Redis) to track jobs and a real object store for generated images.
Right-sizing guide
| Component | Tier family | Why |
|---|---|---|
| API/queue service | Small shared compute | Light, I/O-bound |
| ComfyUI (CPU testing) | m1/m2 memory-optimized | Models are memory-hungry |
| Image output storage | Managed object storage | Don't keep images on the pod |
Persisting models and outputs
The container filesystem is ephemeral — anything written there vanishes on redeploy. So:
- Models: download to a persistent volume on first boot, or bake a download step that caches them. Re-downloading multi-gigabyte checkpoints on every restart is painful.
- Outputs: write generated images to object storage and return URLs, not to the pod's local disk.
Honest limitations
- ComfyUI is GPU-first; CPU-only deployments are for testing or very low volume.
- Scale-to-zero on the free tier means model reload on every cold start — fine for experiments, not for a responsive service.
- Concurrency is bounded by your compute; a queue in front is essential, not optional.
Use the free tier to wire up the API contract and validate workflows, then move the generation backend to an appropriately sized paid tier.
References
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)
- [ComfyUI API examples](https://github.com/comfyanonymous/ComfyUI/tree/master/script_examples)
- [Stability AI: Stable Diffusion models](https://stability.ai/stable-image)
- [Hugging Face: Diffusers](https://huggingface.co/docs/diffusers/index)
ComfyUI as a backend is powerful once you split orchestration from generation and persist your models. PandaStack lets you deploy both services, pick a compute-optimized tier, and queue jobs in an auto-wired datastore. Prototype the API on the free tier at [dashboard.pandastack.io](https://dashboard.pandastack.io).