# How to Deploy a Qdrant Vector Database
Qdrant is a fast, Rust-based vector database built for similarity search at scale. If your RAG or recommendation system has outgrown ad-hoc vectors, a dedicated store like Qdrant pays off in performance and features (filtering, payloads, quantization). This guide self-hosts Qdrant in production and covers the tuning that actually matters.
Persistent storage is non-negotiable
Qdrant keeps your vectors and the search index on disk at /qdrant/storage. In a container, that path is ephemeral by default — lose it and you lose every embedding and have to re-index from scratch. The first rule of deploying Qdrant: attach persistent storage to that directory. Without it, a redeploy is a data-loss event.
Secure it with an API key
An open Qdrant instance lets anyone read, write, or delete your collections. Always set an API key:
# config.yaml
service:
api_key: ${QDRANT_API_KEY}
storage:
storage_path: /qdrant/storageClients then send api-key: on every request. Combined with HTTPS from your platform, that's a reasonable baseline; for stricter setups Qdrant also supports read-only keys and TLS.
Containerize
Use the official image and point it at your config:
FROM qdrant/qdrant:latest
COPY config.yaml /qdrant/config/production.yaml
EXPOSE 6333 6334
ENV QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}Qdrant exposes 6333 for the HTTP/REST API and 6334 for gRPC (faster for bulk operations).
Deploy on PandaStack
- 1Push a repo with the Dockerfile and config to GitHub.
- 2Create a container app in the [dashboard](https://dashboard.pandastack.io) connected to the repo. It builds via rootless BuildKit and serves an HTTPS URL with automatic SSL.
- 3Set
QDRANT_API_KEYas an encrypted env var. - 4Attach persistent storage to
/qdrant/storageso your index survives redeploys. - 5Pick a memory-aware tier — vector search performance is bound by how much of the index fits in RAM. An m1/m2 memory-optimized tier is the right family as your collection grows.
Sizing intuition
Memory usage scales with vector count and dimensionality. A rough mental model:
| Collection size | Tier guidance |
|---|---|
| Tens of thousands of vectors | Small/compute tier is fine |
| Hundreds of thousands+ | Memory-optimized (m1/m2) |
| Millions | Enable quantization, more RAM, consider sharding |
Create a collection and insert vectors
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(url="https://<app>", api_key="<QDRANT_API_KEY>")
client.recreate_collection(
collection_name="docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert("docs", points=[
PointStruct(id=1, vector=embedding, payload={"title": "Intro", "url": "/intro"}),
])
hits = client.search("docs", query_vector=query_embedding, limit=5,
query_filter={"must": [{"key": "lang", "match": {"value": "en"}}]})The distance metric must match how your embeddings were trained — cosine for most modern embedding models. Getting this wrong silently degrades results.
Tuning that matters
- Quantization: scalar or binary quantization shrinks the in-memory footprint dramatically with a small accuracy cost — essential for large collections on bounded RAM.
- HNSW parameters:
mandef_constructtrade index build time and memory for recall. Defaults are sane; raise them only if recall is too low. - Payload indexing: if you filter on metadata fields, create payload indexes so filters are fast.
- gRPC for bulk loads: use the gRPC port for large upserts; it's noticeably faster than REST.
Qdrant vs. pgvector — when to use which
| pgvector on managed Postgres | Self-hosted Qdrant | |
|---|---|---|
| Ops overhead | None (it's your existing DB) | A service to run |
| Best for | Small/medium corpora, transactional needs | Large-scale, filter-heavy search |
| Features | Solid ANN + SQL | Advanced quantization, payloads, sharding |
| Backups | Managed DB backups | Your responsibility (snapshot storage) |
If you're just adding RAG to an app that already has a managed Postgres, start with pgvector. Reach for Qdrant when scale, advanced filtering, or quantization become real requirements.
Operational notes
- Backups: Qdrant supports snapshots — schedule them and copy snapshots off the instance to durable storage.
- Cold starts: don't run a primary vector DB on free-tier scale-to-zero; an index reload on cold start is slow and you want it always available. Use a paid tier.
- Health: Qdrant exposes a health endpoint; tail PandaStack's live logs to catch OOM kills, the classic sign you need a bigger memory tier or quantization.
References
- [Qdrant documentation](https://qdrant.tech/documentation/)
- [Qdrant: Security](https://qdrant.tech/documentation/guides/security/)
- [Qdrant: Quantization](https://qdrant.tech/documentation/guides/quantization/)
- [Qdrant: Snapshots](https://qdrant.tech/documentation/concepts/snapshots/)
Self-hosted Qdrant gives you serious vector search once you nail persistence, security, and memory sizing. PandaStack provides HTTPS, encrypted keys, persistent storage, and memory-optimized tiers — and a managed pgvector Postgres if you decide you don't need a dedicated DB yet. Start at [dashboard.pandastack.io](https://dashboard.pandastack.io).