🚨 Alert Rules
Get notified when your services need attention. Set up intelligent alerts based on error rates, latency, throughput, and health scores.
Quick Start
- 1 Set up notification channels (Slack or Telegram)
- 2 Create an alert rule with conditions
- 3 Get notified when thresholds are breached
Alert Types
📈 Error Rate
Monitor the percentage of failed requests. Ideal for detecting when your service starts experiencing issues.
Example Use Case:
Alert when authentication service error rate exceeds 5% over 5 minutes
⏱️ Latency
Track response times and get alerted when requests are too slow. Choose from average, P50, P95, or P99 metrics.
Example Use Case:
Alert when P95 latency exceeds 1000ms (1 second) for API endpoints
- Average: Good for overall trends
- P95: Recommended for user experience (95% of requests)
- P99: Catch worst-case scenarios
🚀 Throughput
Monitor requests per minute. Perfect for detecting when services stop processing traffic or get overwhelmed.
Service Down Detection:
Alert when service stops sending traces
Traffic Spike Detection:
Alert when traffic exceeds capacity
💚 Health Score
Composite metric combining error rate and latency into a single health score (0-100). Higher is better.
Example Use Case:
Alert when overall service health drops below 70
Scope Types
Global
Monitor all services together. Good for overall system health.
Service
Monitor a specific service. Most common use case.
Endpoint
Monitor a specific endpoint like "POST /api/users".
Best Practices
⏰ Set Appropriate Time Windows
Short windows (1-5 min) detect issues quickly but may cause false positives. Longer windows (15-30 min) are more stable but slower to alert.
🔕 Use Cooldowns to Prevent Spam
Set cooldown periods (15-60 min) to avoid getting flooded with notifications for the same issue. You'll be notified periodically until the issue is resolved.
🎨 Layer Your Alerts
Combine multiple alert types: Error rate alerts catch failures, latency alerts catch slowdowns, and throughput alerts catch outages.
📊 Start with Baselines
Monitor your services for a few days to understand normal behavior before setting alert thresholds. Use your P95 latency as a starting point.
Next Steps
Ready to set up your first alert?