Back to Blog
Guide7 min read2026-05-01

API Response Time Optimization: How to Make Your API Faster

Practical techniques to reduce API response times through query optimization, caching, payload reduction, and smarter architecture patterns.

API Response Time Optimization: How to Make Your API Faster

API response time is the most direct measure of backend performance. Every millisecond of latency is felt by the user and by every service that calls your API. This guide covers the most effective techniques for reducing response time — from quick wins you can ship today to architectural changes that pay off long-term.

Measure First: Know Your Baseline

Before optimizing, establish your baseline metrics. You need p50, p95, and p99 response times — not just averages. An average of 100ms can hide 5% of requests taking 2 seconds.

// Simple Express middleware to log response times
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    console.log(`${req.method} ${req.path} ${res.statusCode} ${duration}ms`);
  });
  next();
});

Feed these logs into your monitoring system and track trends over time. [PandaStack](https://dashboard.pandastack.io) includes built-in monitoring and alerting for deployed containers, so you can set up latency threshold alerts without separate tooling.

Fix Database Queries First

Database calls are responsible for the majority of API latency in most applications. Audit your slowest queries first.

Enable slow query logging (PostgreSQL):

-- Log queries taking more than 100ms
ALTER SYSTEM SET log_min_duration_statement = 100;
SELECT pg_reload_conf();

Use EXPLAIN ANALYZE to understand query execution:

EXPLAIN ANALYZE SELECT * FROM projects
WHERE organization_id = $1 AND status = 'RUNNING'
ORDER BY created_at DESC;

If you see "Seq Scan" on large tables, you need an index:

CREATE INDEX CONCURRENTLY idx_projects_org_status
ON projects(organization_id, status);

The CONCURRENTLY keyword builds the index without locking the table — safe to run in production.

Fetch Only What You Need

Select specific columns — avoid SELECT * in production queries:

// Bad: fetches all columns including large blobs
const projects = await pool.query('SELECT * FROM projects WHERE org_id = $1', [orgId]);

// Good: fetch only what the API response needs
const projects = await pool.query(
  'SELECT id, name, status, created_at FROM projects WHERE org_id = $1',
  [orgId]
);

Paginate large collections — never return unbounded lists:

app.get('/api/projects', async (req, res) => {
  const limit = Math.min(parseInt(req.query.limit) || 20, 100);
  const offset = parseInt(req.query.offset) || 0;

  const { rows } = await pool.query(
    'SELECT id, name, status FROM projects WHERE org_id = $1 LIMIT $2 OFFSET $3',
    [req.user.orgId, limit, offset]
  );
  res.json({ data: rows, limit, offset });
});

Run Independent Operations in Parallel

Sequential await chains are one of the most common API performance killers:

// Slow: 3 sequential queries = sum of all three latencies
const user = await getUser(userId);
const projects = await getProjects(userId);
const billing = await getBillingInfo(userId);

// Fast: 3 parallel queries = latency of the slowest one
const [user, projects, billing] = await Promise.all([
  getUser(userId),
  getProjects(userId),
  getBillingInfo(userId),
]);

This change alone can cut response time by 50–70% for endpoints that aggregate multiple data sources.

Cache Expensive Responses

Use Redis to cache responses that are expensive to generate but do not change frequently:

async function getOrgDashboard(orgId) {
  const cacheKey = `org:dashboard:${orgId}`;

  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached); // ~0.5ms

  // Without cache: ~150ms
  const [projects, metrics, billing] = await Promise.all([
    db.getProjects(orgId),
    db.getMetrics(orgId),
    db.getBilling(orgId),
  ]);
  const dashboard = { projects, metrics, billing };

  await redis.setex(cacheKey, 30, JSON.stringify(dashboard)); // 30s TTL
  return dashboard;
}

Compress API Responses

Enable Gzip or Brotli compression on your API server. JSON APIs can be compressed by 60–80%.

const compression = require('compression');
app.use(compression());

For large JSON responses this can meaningfully reduce transfer time, especially for mobile clients.

Use HTTP/2

HTTP/2 multiplexes multiple requests over a single connection, eliminating head-of-line blocking and reducing connection overhead. Most reverse proxies (Nginx, Caddy) support it with one config line:

server {
    listen 443 ssl http2;
    # ...
}

Add Proper Indexes for Sort and Filter Columns

Any column used in WHERE, ORDER BY, or JOIN clauses that lacks an index causes a full table scan:

-- Index for filtering + sorting (common pattern)
CREATE INDEX idx_cronjob_executions_lookup
ON cronjob_executions(cronjob_id, created_at DESC);

-- Partial index for active records only
CREATE INDEX idx_projects_running
ON projects(org_id)
WHERE status = 'RUNNING';

Response Optimization Checklist

  • [ ] Identify p95/p99 slow endpoints (not just averages)
  • [ ] Enable slow query logging and fix with indexes
  • [ ] Replace sequential awaits with Promise.all
  • [ ] Select only required columns, paginate all lists
  • [ ] Cache expensive aggregations in Redis (30–300s TTL)
  • [ ] Enable compression middleware
  • [ ] Enable HTTP/2 at the reverse proxy
  • [ ] Set up latency alerts so regressions are caught early

Consistently fast APIs require ongoing attention. Profile regularly, add indexes as data grows, and alert on latency regressions before users notice them.

Ready to deploy?

Start free on PandaStack — no credit card required.

Start free on PandaStack

More in Guide

Browse all Guide articles →

See also