API Rate Limiting: How to Protect Your API from Abuse
Without rate limiting, a single misbehaving client — or attacker — can overwhelm your API and take down service for everyone. Rate limiting controls how many requests a client can make in a given time window. This guide covers the algorithms, implementation patterns, and best practices.
Why Rate Limiting Matters
- DDoS mitigation — Prevents volumetric attacks from exhausting server resources.
- Abuse prevention — Stops scrapers, credential stuffers, and API key abusers.
- Cost control — Downstream services (AI APIs, databases, third parties) cost money per call.
- Fairness — Prevents one client from starving others.
Rate Limiting Algorithms
Fixed Window Counter
Count requests in a fixed time window (e.g., per minute). Reset the counter at the window boundary.
Window: 12:00:00 – 12:01:00
Requests: 47/100 → allowed
Request 101 at 12:00:59 → rejected (429)
Window reset at 12:01:00 → counter = 0Problem: Clients can exploit window boundaries to make 2x the allowed requests (100 at 12:00:59 + 100 at 12:01:00).
Sliding Window
Counts requests over a rolling window ending at the current time. No boundary exploitation.
More accurate but requires storing per-request timestamps or using approximate algorithms.
Token Bucket
A bucket holds N tokens. Each request consumes one token. Tokens refill at a fixed rate. Clients can burst up to bucket size, but can't sustain above the refill rate.
Bucket capacity: 100
Refill rate: 10 tokens/second
Client bursts 100 requests → all allowed (bucket empties)
Next request within 0.1s → rejected (no tokens)
After 10s → 100 tokens refilledToken bucket is great for allowing short bursts while enforcing average rate limits.
Leaky Bucket
Requests are processed at a fixed rate regardless of burst. Excess requests queue (or are dropped). Provides smooth output but can cause latency for bursty traffic.
Implementation: Express.js with express-rate-limit
npm install express-rate-limitconst rateLimit = require('express-rate-limit');
// Global rate limiter
const globalLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 500,
standardHeaders: true, // Return RateLimit-* headers
legacyHeaders: false,
message: { error: 'Too many requests, please try again later.' }
});
// Strict limiter for auth endpoints
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 10,
message: { error: 'Too many login attempts.' }
});
app.use('/api/', globalLimiter);
app.use('/api/auth/login', authLimiter);
app.use('/api/auth/register', authLimiter);Redis-Backed Rate Limiting (for Distributed Systems)
In-memory rate limiters don't work when you have multiple server instances — each server tracks its own count. Use Redis to share state:
npm install rate-limit-redis ioredisconst rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100,
standardHeaders: true,
legacyHeaders: false,
store: new RedisStore({
sendCommand: (...args) => redis.call(...args)
}),
keyGenerator: (req) => req.headers['x-api-key'] || req.ip
});
app.use('/api/', limiter);Rate Limit by API Key vs. IP
- By IP — Simple but problematic for shared IPs (offices, NATs) and easy to rotate.
- By API key — More accurate and ties limits to authenticated clients.
- By user ID — Best for authenticated APIs.
keyGenerator: (req) => {
return req.headers['x-api-key']
|| req.user?.id
|| req.ip;
}Rate Limit Response Headers
Clients need to know their limit, remaining quota, and when it resets:
RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 1714521600
Retry-After: 47express-rate-limit with standardHeaders: true sends these automatically. Your API clients should check Retry-After and back off instead of hammering on 429 responses.
Tiered Rate Limits
Different plans get different limits:
const getLimitForPlan = (plan) => {
const limits = { free: 100, pro: 1000, enterprise: 10000 };
return limits[plan] || 100;
};
app.use('/api/', async (req, res, next) => {
const plan = req.user?.plan || 'free';
const limit = getLimitForPlan(plan);
const limiter = rateLimit({
windowMs: 60 * 60 * 1000, // 1 hour
max: limit,
store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }),
keyGenerator: (req) => `${req.user.id}:${plan}`
});
return limiter(req, res, next);
});Rate Limiting at the Infrastructure Layer
For high-scale protection, consider rate limiting at the edge:
- Cloudflare Rate Limiting — Rule-based, DDoS-grade
- AWS API Gateway — Built-in throttling per API key
- Nginx —
limit_req_zonemodule for token bucket limiting
http {
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://localhost:3000;
}
}
}Conclusion
Rate limiting is a critical layer of API security and reliability. Start with per-IP limits, move to per-API-key limits as you add authentication, use Redis for multi-instance deployments, and expose standardized headers so clients can respect limits gracefully. Deploy your rate-limited API on PandaStack using Docker containers at [dashboard.pandastack.io](https://dashboard.pandastack.io). See [docs.pandastack.io](https://docs.pandastack.io) for more.