Back to Blog
Guide7 min read2026-05-01

API Rate Limiting: How to Protect Your API from Abuse

Learn how to implement API rate limiting using token buckets, sliding windows, and middleware to protect your API from overload and abuse.

API Rate Limiting: How to Protect Your API from Abuse

Without rate limiting, a single misbehaving client — or attacker — can overwhelm your API and take down service for everyone. Rate limiting controls how many requests a client can make in a given time window. This guide covers the algorithms, implementation patterns, and best practices.

Why Rate Limiting Matters

  • DDoS mitigation — Prevents volumetric attacks from exhausting server resources.
  • Abuse prevention — Stops scrapers, credential stuffers, and API key abusers.
  • Cost control — Downstream services (AI APIs, databases, third parties) cost money per call.
  • Fairness — Prevents one client from starving others.

Rate Limiting Algorithms

Fixed Window Counter

Count requests in a fixed time window (e.g., per minute). Reset the counter at the window boundary.

Window: 12:00:00 – 12:01:00
Requests: 47/100 → allowed
Request 101 at 12:00:59 → rejected (429)
Window reset at 12:01:00 → counter = 0

Problem: Clients can exploit window boundaries to make 2x the allowed requests (100 at 12:00:59 + 100 at 12:01:00).

Sliding Window

Counts requests over a rolling window ending at the current time. No boundary exploitation.

More accurate but requires storing per-request timestamps or using approximate algorithms.

Token Bucket

A bucket holds N tokens. Each request consumes one token. Tokens refill at a fixed rate. Clients can burst up to bucket size, but can't sustain above the refill rate.

Bucket capacity: 100
Refill rate: 10 tokens/second
Client bursts 100 requests → all allowed (bucket empties)
Next request within 0.1s → rejected (no tokens)
After 10s → 100 tokens refilled

Token bucket is great for allowing short bursts while enforcing average rate limits.

Leaky Bucket

Requests are processed at a fixed rate regardless of burst. Excess requests queue (or are dropped). Provides smooth output but can cause latency for bursty traffic.

Implementation: Express.js with express-rate-limit

npm install express-rate-limit
const rateLimit = require('express-rate-limit');

// Global rate limiter
const globalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 500,
  standardHeaders: true,  // Return RateLimit-* headers
  legacyHeaders: false,
  message: { error: 'Too many requests, please try again later.' }
});

// Strict limiter for auth endpoints
const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10,
  message: { error: 'Too many login attempts.' }
});

app.use('/api/', globalLimiter);
app.use('/api/auth/login', authLimiter);
app.use('/api/auth/register', authLimiter);

Redis-Backed Rate Limiting (for Distributed Systems)

In-memory rate limiters don't work when you have multiple server instances — each server tracks its own count. Use Redis to share state:

npm install rate-limit-redis ioredis
const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args)
  }),
  keyGenerator: (req) => req.headers['x-api-key'] || req.ip
});

app.use('/api/', limiter);

Rate Limit by API Key vs. IP

  • By IP — Simple but problematic for shared IPs (offices, NATs) and easy to rotate.
  • By API key — More accurate and ties limits to authenticated clients.
  • By user ID — Best for authenticated APIs.
keyGenerator: (req) => {
  return req.headers['x-api-key']
    || req.user?.id
    || req.ip;
}

Rate Limit Response Headers

Clients need to know their limit, remaining quota, and when it resets:

RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 1714521600
Retry-After: 47

express-rate-limit with standardHeaders: true sends these automatically. Your API clients should check Retry-After and back off instead of hammering on 429 responses.

Tiered Rate Limits

Different plans get different limits:

const getLimitForPlan = (plan) => {
  const limits = { free: 100, pro: 1000, enterprise: 10000 };
  return limits[plan] || 100;
};

app.use('/api/', async (req, res, next) => {
  const plan = req.user?.plan || 'free';
  const limit = getLimitForPlan(plan);

  const limiter = rateLimit({
    windowMs: 60 * 60 * 1000, // 1 hour
    max: limit,
    store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }),
    keyGenerator: (req) => `${req.user.id}:${plan}`
  });

  return limiter(req, res, next);
});

Rate Limiting at the Infrastructure Layer

For high-scale protection, consider rate limiting at the edge:

  • Cloudflare Rate Limiting — Rule-based, DDoS-grade
  • AWS API Gateway — Built-in throttling per API key
  • Nginxlimit_req_zone module for token bucket limiting
http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            proxy_pass http://localhost:3000;
        }
    }
}

Conclusion

Rate limiting is a critical layer of API security and reliability. Start with per-IP limits, move to per-API-key limits as you add authentication, use Redis for multi-instance deployments, and expose standardized headers so clients can respect limits gracefully. Deploy your rate-limited API on PandaStack using Docker containers at [dashboard.pandastack.io](https://dashboard.pandastack.io). See [docs.pandastack.io](https://docs.pandastack.io) for more.

Ready to deploy?

Start free on PandaStack — no credit card required.

Start free on PandaStack

More in Guide

Browse all Guide articles →