Scheduling Patterns That Scale

Not all background jobs are the same. A job that sends a single daily digest email is architecturally different from one that processes millions of user events, and both differ from a multi-step data pipeline with sequential dependencies.

Choosing the right scheduling pattern for each workload makes your application more reliable, easier to operate, and faster to debug. This guide covers the most common patterns with practical examples.

Pattern 1: Simple Scheduled Job

The simplest pattern: a single container runs on a cron schedule, does its work, and exits. No queues, no dependencies, no concurrency.

Use this for: nightly backups, weekly reports, hourly data sync, monthly invoice generation.

# Cron expressions for common simple schedules
"0 2 * * *"       # Daily at 2 AM
"0 9 * * 1"       # Weekly on Monday at 9 AM
"0 0 1 * *"       # Monthly on the 1st at midnight
"*/30 * * * *"    # Every 30 minutes

# Deploy a simple scheduled job on PandaStack
npm install -g @pandastack/cli

panda cronjob create \
  --name weekly-digest \
  --image your-registry/digest-mailer:latest \
  --schedule "0 8 * * 1"

Best for: Predictable, low-volume work with no concurrency requirements.

Pattern 2: Fan-Out Batch Processing

One scheduler job discovers the work to be done and distributes it across many parallel workers. The scheduler ("fan-out" job) runs on a cron schedule; individual worker jobs run in parallel.

Use this for: sending emails to a large user base, processing a backlog of records, running analytics across many tenants.

# Scheduler job (runs on cron schedule)
# Finds all users needing a digest email and enqueues one task per user

import psycopg2
import redis
import json

def schedule_digest_emails():
    db = psycopg2.connect(DATABASE_URL)
    r = redis.Redis.from_url(REDIS_URL)

    cur = db.cursor()
    cur.execute("""
        SELECT id, email FROM users
        WHERE digest_enabled = true
        AND last_digest_at < NOW() - INTERVAL '7 days'
    """)

    count = 0
    for user_id, email in cur.fetchall():
        task = json.dumps({"user_id": user_id, "email": email})
        r.lpush("email_queue", task)
        count += 1

    print(f"Enqueued {count} digest emails")

schedule_digest_emails()

# Schedule the fan-out coordinator every Monday at 7 AM
panda cronjob create \
  --name digest-scheduler \
  --image your-registry/digest-scheduler:latest \
  --schedule "0 7 * * 1"

Best for: High-volume batch work where individual items can be processed independently and in parallel.

Pattern 3: Chained Pipeline Jobs

Multiple jobs run in sequence, each depending on the previous job's output. Job A completes, writes output, and triggers Job B.

Use this for: ETL pipelines, multi-stage report generation, data transformation workflows.

#!/bin/bash
# pipeline.sh — run inside the scheduler container

set -e

echo "Step 1: Extract data from source"
python extract.py --output /data/raw.json

echo "Step 2: Transform data"
python transform.py --input /data/raw.json --output /data/cleaned.json

echo "Step 3: Load data to warehouse"
python load.py --input /data/cleaned.json

echo "Step 4: Generate summary report"
python report.py --source warehouse

echo "Pipeline complete"

# Schedule the pipeline to run nightly at 1 AM
panda cronjob create \
  --name etl-pipeline \
  --image your-registry/etl:latest \
  --schedule "0 1 * * *"

Best for: Sequential data workflows where each stage depends on the previous one completing successfully.

Pattern 4: Rolling Window Jobs

The job processes a sliding window of data — "everything from the last N hours" — rather than processing all data or tracking exactly what was last processed.

Use this for: metrics aggregation, alert evaluation, rolling analytics.

import datetime

def aggregate_hourly_metrics():
    now = datetime.datetime.utcnow()
    window_start = now - datetime.timedelta(hours=1)

    events = db.query(
        "SELECT * FROM events WHERE created_at >= %s AND created_at < %s",
        [window_start, now]
    )

    metrics = compute_metrics(events)
    db.upsert("hourly_metrics", {"hour": window_start, **metrics})
    print(f"Aggregated {len(events)} events for hour {window_start.strftime('%H:00')}")

# Run every hour at minute 5 (allows for late events)
panda cronjob create \
  --name metrics-aggregator \
  --image your-registry/metrics:latest \
  --schedule "5 * * * *"

Best for: Metrics and analytics that can tolerate slight duplication at window boundaries.

Pattern 5: Catch-Up Jobs

When a job misses its scheduled run (due to a deployment, outage, or configuration error), the next run should process backfill data rather than skipping what was missed.

def process_with_catchup():
    last_processed = db.get_last_processed_timestamp()
    now = datetime.datetime.utcnow()

    if (now - last_processed).total_seconds() > 3600 * 2:
        print(f"Catch-up mode: processing from {last_processed} to {now}")
    else:
        print(f"Normal run: processing last hour")

    records = db.query(
        "SELECT * FROM events WHERE created_at > %s AND created_at <= %s ORDER BY created_at",
        [last_processed, now]
    )
    process_records(records)
    db.set_last_processed_timestamp(now)

Best for: Jobs where missed runs would leave gaps in processed data that must be recovered.

Viewing Execution History Across Patterns

Regardless of which pattern you use, execution visibility is critical. PandaStack records start time, duration, exit code, and full logs for every run:

# View recent executions for any job
panda cronjob executions etl-pipeline

# Stream logs from the latest run
panda cronjob logs etl-pipeline --latest

# Manually trigger a job run (useful for catch-up testing)
panda cronjob run etl-pipeline

The dashboard at [dashboard.pandastack.io](https://dashboard.pandastack.io) gives you a unified view of all your scheduled jobs and their execution history.

Choosing the Right Pattern

Pattern	Trigger	Volume	Dependencies
Simple scheduled	Time	Low	None
Fan-out batch	Time + queue	High	Parallel
Chained pipeline	Time	Medium	Sequential
Rolling window	Time	Medium	Stateless
Catch-up	Time + state	Variable	Previous run

Start with the simplest pattern that fits your workload. Introduce fan-out and chaining only when a single container can't handle the volume or when stages have meaningful separation of concerns. Visit [docs.pandastack.io](https://docs.pandastack.io) to learn how PandaStack supports scheduled Docker containers for all these patterns.

Job Scheduling Patterns for Web Applications

Scheduling Patterns That Scale

Pattern 1: Simple Scheduled Job

Pattern 2: Fan-Out Batch Processing

Pattern 3: Chained Pipeline Jobs

Pattern 4: Rolling Window Jobs

Pattern 5: Catch-Up Jobs

Viewing Execution History Across Patterns

Choosing the Right Pattern

Ready to deploy?

More in Guide

Edge Functions Explained: What They Are and Why Your App Needs Them

How We Cut Our Cloud Bill by 60% by Switching to PandaStack

Add SSO Authentication to Any Web App in Minutes

See also