API Rate Limiting: How to Protect Your API from Abuse

Without rate limiting, a single misbehaving client — or attacker — can overwhelm your API and take down service for everyone. Rate limiting controls how many requests a client can make in a given time window. This guide covers the algorithms, implementation patterns, and best practices.

Why Rate Limiting Matters

DDoS mitigation — Prevents volumetric attacks from exhausting server resources.
Abuse prevention — Stops scrapers, credential stuffers, and API key abusers.
Cost control — Downstream services (AI APIs, databases, third parties) cost money per call.
Fairness — Prevents one client from starving others.

Rate Limiting Algorithms

Fixed Window Counter

Count requests in a fixed time window (e.g., per minute). Reset the counter at the window boundary.

Window: 12:00:00 – 12:01:00
Requests: 47/100 → allowed
Request 101 at 12:00:59 → rejected (429)
Window reset at 12:01:00 → counter = 0

Problem: Clients can exploit window boundaries to make 2x the allowed requests (100 at 12:00:59 + 100 at 12:01:00).

Sliding Window

Counts requests over a rolling window ending at the current time. No boundary exploitation.

More accurate but requires storing per-request timestamps or using approximate algorithms.

Token Bucket

A bucket holds N tokens. Each request consumes one token. Tokens refill at a fixed rate. Clients can burst up to bucket size, but can't sustain above the refill rate.

Bucket capacity: 100
Refill rate: 10 tokens/second
Client bursts 100 requests → all allowed (bucket empties)
Next request within 0.1s → rejected (no tokens)
After 10s → 100 tokens refilled

Token bucket is great for allowing short bursts while enforcing average rate limits.

Leaky Bucket

Requests are processed at a fixed rate regardless of burst. Excess requests queue (or are dropped). Provides smooth output but can cause latency for bursty traffic.

Implementation: Express.js with express-rate-limit

npm install express-rate-limit

const rateLimit = require('express-rate-limit');

// Global rate limiter
const globalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 500,
  standardHeaders: true,  // Return RateLimit-* headers
  legacyHeaders: false,
  message: { error: 'Too many requests, please try again later.' }
});

// Strict limiter for auth endpoints
const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10,
  message: { error: 'Too many login attempts.' }
});

app.use('/api/', globalLimiter);
app.use('/api/auth/login', authLimiter);
app.use('/api/auth/register', authLimiter);

Redis-Backed Rate Limiting (for Distributed Systems)

In-memory rate limiters don't work when you have multiple server instances — each server tracks its own count. Use Redis to share state:

npm install rate-limit-redis ioredis

const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args)
  }),
  keyGenerator: (req) => req.headers['x-api-key'] || req.ip
});

app.use('/api/', limiter);

Rate Limit by API Key vs. IP

By IP — Simple but problematic for shared IPs (offices, NATs) and easy to rotate.
By API key — More accurate and ties limits to authenticated clients.
By user ID — Best for authenticated APIs.

keyGenerator: (req) => {
  return req.headers['x-api-key']
    || req.user?.id
    || req.ip;
}

Rate Limit Response Headers

Clients need to know their limit, remaining quota, and when it resets:

RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 1714521600
Retry-After: 47

express-rate-limit with standardHeaders: true sends these automatically. Your API clients should check Retry-After and back off instead of hammering on 429 responses.

Tiered Rate Limits

Different plans get different limits:

const getLimitForPlan = (plan) => {
  const limits = { free: 100, pro: 1000, enterprise: 10000 };
  return limits[plan] || 100;
};

app.use('/api/', async (req, res, next) => {
  const plan = req.user?.plan || 'free';
  const limit = getLimitForPlan(plan);

  const limiter = rateLimit({
    windowMs: 60 * 60 * 1000, // 1 hour
    max: limit,
    store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }),
    keyGenerator: (req) => `${req.user.id}:${plan}`
  });

  return limiter(req, res, next);
});

Rate Limiting at the Infrastructure Layer

For high-scale protection, consider rate limiting at the edge:

Cloudflare Rate Limiting — Rule-based, DDoS-grade
AWS API Gateway — Built-in throttling per API key
Nginx — limit_req_zone module for token bucket limiting

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            proxy_pass http://localhost:3000;
        }
    }
}

Conclusion

Rate limiting is a critical layer of API security and reliability. Start with per-IP limits, move to per-API-key limits as you add authentication, use Redis for multi-instance deployments, and expose standardized headers so clients can respect limits gracefully. Deploy your rate-limited API on PandaStack using Docker containers at [dashboard.pandastack.io](https://dashboard.pandastack.io). See [docs.pandastack.io](https://docs.pandastack.io) for more.

API Rate Limiting: How to Protect Your API from Abuse