The Problem with Bad Alerts

Alerting done poorly is almost as bad as no alerting at all. Too many alerts and your team starts ignoring them — a phenomenon called alert fatigue. Too few and you miss real incidents. Alerts that fire at 3am for a non-critical issue train people to turn off their phones.

Good alerting is a discipline. It requires intentional design: what to alert on, when to alert, who to notify, and through which channel. This guide walks through the principles that make alerting actually useful.

Alert on Symptoms, Not Causes

One of the most important principles in alerting: alert on user-facing symptoms, not internal causes.

Cause-based alert: "CPU usage is above 80%"

Symptom-based alert: "Response time is above 2 seconds"

High CPU might not affect users at all. Slow response times definitely do. By alerting on symptoms, you ensure every alert represents an actual user experience problem, not a technical curiosity.

This doesn't mean you never track CPU or memory — those metrics are useful for diagnosis. But they shouldn't be your primary alert triggers.

Alert Severity Levels

Not all alerts require the same response. Define severity levels and make sure everyone on your team knows what each means:

Severity	Meaning	Response
Critical	Service is down or severely degraded	Immediate response, any hour
High	Significant degradation, users affected	Response within 30 minutes
Medium	Partial degradation, workaround exists	Response within business hours
Low	Minor issue, informational	Review at next opportunity

Map these severity levels to notification channels. Critical alerts should page someone. Low-severity alerts should go to a Slack channel for async review.

Choosing the Right Notification Channel

PandaStack supports three alert delivery channels: email, Slack, and webhooks. Here's how to use each effectively:

Email works well for:

Non-urgent alerts and daily digests
Alerts that need a paper trail for compliance
Reaching people who aren't in your Slack workspace

Slack works well for:

Team-visible alerts where anyone can pick up the incident
Medium-priority alerts during business hours
Alerts that benefit from threaded discussion

Webhook works well for:

Integrating with incident management systems
Triggering automated remediation workflows
Custom routing logic based on alert content

Reducing Alert Fatigue

Alert fatigue is the enemy of reliability. When engineers learn to ignore alerts, you lose your early warning system. Here's how to fight it:

Require consecutive failures. Don't alert on the first failed check — alert after two or three consecutive failures. This eliminates false positives from transient network blips.

Set meaningful thresholds. "Alert if error rate > 0.1%" might be appropriate for a payment service. For a static marketing site, you might not care until it's completely down. Tune thresholds to your actual tolerance.

Remove stale alerts. Regularly audit your alert rules. Alerts for services you've retired or thresholds that never fire (or always fire) should be updated or removed.

Group related alerts. If five services all fail simultaneously because of a shared database outage, you should get one alert about the root cause, not five separate notifications.

Alerting for Different Deployment Types

Different types of applications have different alert priorities:

Static sites — Alert on availability (is the site returning 200?) and, if relevant, CDN cache hit rates.

Docker containers — Alert on response time, error rate, and restart frequency (a container that keeps restarting is a container with a crash loop).

Databases (PostgreSQL, MySQL, Redis, MongoDB) — Alert on connection availability and query latency. A database that goes down takes every service that depends on it down too.

Cronjobs — Alert on job failure and missed executions. If a job is scheduled to run every hour and it hasn't run in two hours, something is wrong.

Edge functions — Alert on invocation errors and timeout rates.

PandaStack supports monitoring and alerting across all these deployment types from a single dashboard at [dashboard.pandastack.io](https://dashboard.pandastack.io).

Runbooks: Making Alerts Actionable

Every alert should link to a runbook — a short document that explains:

What this alert means
How to diagnose the root cause
Steps to remediate
Who to escalate to if the standard fix doesn't work

An alert without a runbook asks the on-call engineer to figure out the diagnosis and fix under pressure. A runbook turns that into a procedure.

Testing Your Alerts

Alerts that have never fired might not work when you need them. Test your alert setup periodically:

Intentionally take down a non-production service to verify the alert fires
Confirm the notification reaches the right channel
Verify the alert resolves when the service recovers

Conclusion

Smart alerting is a force multiplier for your engineering team. It means fewer incidents go undetected, and the incidents that do get caught are resolved faster. Start with symptom-based alerts, pick the right channels for each severity level, and keep your alert configuration clean. Configure your alerts at [dashboard.pandastack.io](https://dashboard.pandastack.io) and see [docs.pandastack.io](https://docs.pandastack.io) for detailed setup guidance.

Alerting Best Practices: How to Set Up Smart Alerts

The Problem with Bad Alerts

Alert on Symptoms, Not Causes

Alert Severity Levels

Choosing the Right Notification Channel

Reducing Alert Fatigue

Alerting for Different Deployment Types

Runbooks: Making Alerts Actionable

Testing Your Alerts

Conclusion

Ready to deploy?

More in Guide

Edge Functions Explained: What They Are and Why Your App Needs Them

How We Cut Our Cloud Bill by 60% by Switching to PandaStack

Add SSO Authentication to Any Web App in Minutes

See also