Back to Blog
Tutorial11 min read2026-07-03

How to Deploy a Qdrant Vector Database

Self-host Qdrant for vector search: deploy it with persistent storage, secure it with an API key, tune collection parameters, and decide when a dedicated vector DB beats pgvector.

Ajay Kumar
Ajay Kumar
Founder & DevOps, PandaStack

# How to Deploy a Qdrant Vector Database

Qdrant is a fast, Rust-based vector database built for similarity search at scale. If your RAG or recommendation system has outgrown ad-hoc vectors, a dedicated store like Qdrant pays off in performance and features (filtering, payloads, quantization). This guide self-hosts Qdrant in production and covers the tuning that actually matters.

Persistent storage is non-negotiable

Qdrant keeps your vectors and the search index on disk at /qdrant/storage. In a container, that path is ephemeral by default — lose it and you lose every embedding and have to re-index from scratch. The first rule of deploying Qdrant: attach persistent storage to that directory. Without it, a redeploy is a data-loss event.

Secure it with an API key

An open Qdrant instance lets anyone read, write, or delete your collections. Always set an API key:

# config.yaml
service:
  api_key: ${QDRANT_API_KEY}
storage:
  storage_path: /qdrant/storage

Clients then send api-key: on every request. Combined with HTTPS from your platform, that's a reasonable baseline; for stricter setups Qdrant also supports read-only keys and TLS.

Containerize

Use the official image and point it at your config:

FROM qdrant/qdrant:latest
COPY config.yaml /qdrant/config/production.yaml
EXPOSE 6333 6334
ENV QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}

Qdrant exposes 6333 for the HTTP/REST API and 6334 for gRPC (faster for bulk operations).

Deploy on PandaStack

  1. 1Push a repo with the Dockerfile and config to GitHub.
  2. 2Create a container app in the [dashboard](https://dashboard.pandastack.io) connected to the repo. It builds via rootless BuildKit and serves an HTTPS URL with automatic SSL.
  3. 3Set QDRANT_API_KEY as an encrypted env var.
  4. 4Attach persistent storage to /qdrant/storage so your index survives redeploys.
  5. 5Pick a memory-aware tier — vector search performance is bound by how much of the index fits in RAM. An m1/m2 memory-optimized tier is the right family as your collection grows.

Sizing intuition

Memory usage scales with vector count and dimensionality. A rough mental model:

Collection sizeTier guidance
Tens of thousands of vectorsSmall/compute tier is fine
Hundreds of thousands+Memory-optimized (m1/m2)
MillionsEnable quantization, more RAM, consider sharding

Create a collection and insert vectors

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="https://<app>", api_key="<QDRANT_API_KEY>")

client.recreate_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

client.upsert("docs", points=[
    PointStruct(id=1, vector=embedding, payload={"title": "Intro", "url": "/intro"}),
])

hits = client.search("docs", query_vector=query_embedding, limit=5,
                     query_filter={"must": [{"key": "lang", "match": {"value": "en"}}]})

The distance metric must match how your embeddings were trained — cosine for most modern embedding models. Getting this wrong silently degrades results.

Tuning that matters

  • Quantization: scalar or binary quantization shrinks the in-memory footprint dramatically with a small accuracy cost — essential for large collections on bounded RAM.
  • HNSW parameters: m and ef_construct trade index build time and memory for recall. Defaults are sane; raise them only if recall is too low.
  • Payload indexing: if you filter on metadata fields, create payload indexes so filters are fast.
  • gRPC for bulk loads: use the gRPC port for large upserts; it's noticeably faster than REST.

Qdrant vs. pgvector — when to use which

pgvector on managed PostgresSelf-hosted Qdrant
Ops overheadNone (it's your existing DB)A service to run
Best forSmall/medium corpora, transactional needsLarge-scale, filter-heavy search
FeaturesSolid ANN + SQLAdvanced quantization, payloads, sharding
BackupsManaged DB backupsYour responsibility (snapshot storage)

If you're just adding RAG to an app that already has a managed Postgres, start with pgvector. Reach for Qdrant when scale, advanced filtering, or quantization become real requirements.

Operational notes

  • Backups: Qdrant supports snapshots — schedule them and copy snapshots off the instance to durable storage.
  • Cold starts: don't run a primary vector DB on free-tier scale-to-zero; an index reload on cold start is slow and you want it always available. Use a paid tier.
  • Health: Qdrant exposes a health endpoint; tail PandaStack's live logs to catch OOM kills, the classic sign you need a bigger memory tier or quantization.

References

  • [Qdrant documentation](https://qdrant.tech/documentation/)
  • [Qdrant: Security](https://qdrant.tech/documentation/guides/security/)
  • [Qdrant: Quantization](https://qdrant.tech/documentation/guides/quantization/)
  • [Qdrant: Snapshots](https://qdrant.tech/documentation/concepts/snapshots/)

Self-hosted Qdrant gives you serious vector search once you nail persistence, security, and memory sizing. PandaStack provides HTTPS, encrypted keys, persistent storage, and memory-optimized tiers — and a managed pgvector Postgres if you decide you don't need a dedicated DB yet. Start at [dashboard.pandastack.io](https://dashboard.pandastack.io).

Ready to deploy?

Start free on PandaStack.

Start free on PandaStack

More in Tutorial

Browse all Tutorial articles →

See also