Back to Blog
Tutorial11 min read2026-06-27

How to Deploy a ComfyUI Image Pipeline

Deploy ComfyUI as a headless image-generation backend: expose its API, persist models and workflows, and right-size compute for Stable Diffusion pipelines.

Ajay Kumar
Ajay Kumar
Founder & DevOps, PandaStack

# How to Deploy a ComfyUI Image Pipeline

ComfyUI is the most flexible node-based interface for Stable Diffusion and related image models. Most people run it locally with a GPU, but you can deploy it as a backend service that your app calls via its API. This guide covers running ComfyUI headless in production and the realities of doing so.

Set expectations: this is a heavy workload

Before anything else, be honest about the requirements. Diffusion models are large (multiple gigabytes per checkpoint) and generation is compute-intensive. ComfyUI runs *much* faster on a GPU; CPU-only generation works for tiny test images but is impractical for real output. Plan your deployment around:

  • Model storage — checkpoints, LoRAs, VAEs, and ControlNet models add up to many gigabytes. They need persistent storage, not an ephemeral container filesystem.
  • Compute — substantial CPU/RAM at minimum; GPU for usable speed.
  • Cold loading — first generation after start loads models into memory, which is slow.

Run ComfyUI headless with its API

ComfyUI exposes an HTTP/WebSocket API. The key flag is to bind to all interfaces so the container is reachable:

python main.py --listen 0.0.0.0 --port 8188

You drive it by POSTing a workflow (the JSON graph you'd build in the UI) to /prompt, then polling /history/{prompt_id} or listening on the WebSocket for completion. Export a workflow as API JSON from the UI (enable dev mode options) and treat that JSON as your pipeline definition checked into git.

import requests, json

def generate(prompt_workflow: dict):
    r = requests.post("http://comfyui:8188/prompt", json={"prompt": prompt_workflow})
    prompt_id = r.json()["prompt_id"]
    # poll /history/{prompt_id} until images appear, then fetch via /view
    return prompt_id

Containerize

A Dockerfile that installs ComfyUI and its dependencies. Keep models *out* of the image — mount or download them at runtime so the image stays small and models stay shareable:

FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends git \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /opt
RUN git clone https://github.com/comfyanonymous/ComfyUI.git
WORKDIR /opt/ComfyUI
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8188
CMD ["python", "main.py", "--listen", "0.0.0.0", "--port", "8188"]

Architecture: put a thin API in front

Don't expose raw ComfyUI to the internet. Front it with a small API service that:

  • Validates and templates workflows (users pass parameters, not raw graphs).
  • Authenticates requests.
  • Queues jobs so concurrent requests don't overwhelm a single GPU.
  • Returns job IDs and serves finished images.
Client → API service (auth, queue) → ComfyUI backend → image output

This two-service split is the production-grade shape and keeps the powerful, unauthenticated ComfyUI internals private.

Deploy on PandaStack

  1. 1Push your repo (the ComfyUI Dockerfile plus your API service) to GitHub.
  2. 2Create a container app for the API service and one for the ComfyUI backend in the [dashboard](https://dashboard.pandastack.io). They build via rootless BuildKit and deploy via Helm.
  3. 3Choose a compute tier sized to the workload. For CPU inference and orchestration, c1/c2 compute-optimized or m1/m2 memory-optimized tiers (up to 8 CPU / 16GB on C2-2XCompute) handle the API, queueing, and lighter models. Diffusion at scale ultimately wants GPU — size honestly for your model.
  4. 4Use a managed datastore (Postgres/Redis) to track jobs and a real object store for generated images.

Right-sizing guide

ComponentTier familyWhy
API/queue serviceSmall shared computeLight, I/O-bound
ComfyUI (CPU testing)m1/m2 memory-optimizedModels are memory-hungry
Image output storageManaged object storageDon't keep images on the pod

Persisting models and outputs

The container filesystem is ephemeral — anything written there vanishes on redeploy. So:

  • Models: download to a persistent volume on first boot, or bake a download step that caches them. Re-downloading multi-gigabyte checkpoints on every restart is painful.
  • Outputs: write generated images to object storage and return URLs, not to the pod's local disk.

Honest limitations

  • ComfyUI is GPU-first; CPU-only deployments are for testing or very low volume.
  • Scale-to-zero on the free tier means model reload on every cold start — fine for experiments, not for a responsive service.
  • Concurrency is bounded by your compute; a queue in front is essential, not optional.

Use the free tier to wire up the API contract and validate workflows, then move the generation backend to an appropriately sized paid tier.

References

  • [ComfyUI](https://github.com/comfyanonymous/ComfyUI)
  • [ComfyUI API examples](https://github.com/comfyanonymous/ComfyUI/tree/master/script_examples)
  • [Stability AI: Stable Diffusion models](https://stability.ai/stable-image)
  • [Hugging Face: Diffusers](https://huggingface.co/docs/diffusers/index)

ComfyUI as a backend is powerful once you split orchestration from generation and persist your models. PandaStack lets you deploy both services, pick a compute-optimized tier, and queue jobs in an auto-wired datastore. Prototype the API on the free tier at [dashboard.pandastack.io](https://dashboard.pandastack.io).

Ready to deploy?

Start free on PandaStack.

Start free on PandaStack

More in Tutorial

Browse all Tutorial articles →

See also