English
User Guide
On-Call

On-Call

Manage on-call schedules, rotations, and alert escalation.

On-Call Overview

Overview

The on-call system ensures the right people are alerted when issues occur:

  • Schedules: Define who is on-call when
  • Rotations: Automatic rotation between team members
  • Escalation: Multi-level alert escalation
  • Alerts: Track and acknowledge incoming alerts

On-Call Dashboard

Navigate to Dashboard > On-Call to see:

  • Current on-call person(s)
  • Active alerts
  • Upcoming schedule changes
  • Recent alert history

Schedules

On-Call Schedules

Creating a Schedule

  1. Navigate to On-Call > Schedules
  2. Click "New Schedule"
  3. Configure:
FieldDescription
NameSchedule identifier
TimezoneSchedule timezone
Start DateWhen schedule begins

Rotation Types

Weekly

Each person is on-call for a full week:

Week 1: Alice
Week 2: Bob
Week 3: Charlie
Week 4: Alice (cycles)

Daily

Each person is on-call for a day:

Monday: Alice
Tuesday: Bob
Wednesday: Charlie
...

Custom Hours

Define specific time slots:

Weekdays 9am-5pm: Team A
Weekdays 5pm-9am: Team B
Weekends: Team C

Adding Participants

  1. Open the schedule
  2. Click "Add Participant"
  3. Select team member
  4. Set their rotation order
  5. Configure contact methods

Overrides

Create temporary overrides:

  1. Open the schedule
  2. Click "Add Override"
  3. Select:
    • User to replace
    • Replacement user
    • Start/end times
  4. Save

Use overrides for vacations, sick days, or planned absences.

Escalation Policies

Define how alerts escalate if not acknowledged:

Creating a Policy

  1. Navigate to On-Call > Schedules
  2. Open a schedule
  3. Click "Escalation Policy"
  4. Add levels:

Level 1 (immediate):

  • Primary on-call
  • Wait 5 minutes

Level 2 (5 min):

  • Secondary on-call
  • Wait 10 minutes

Level 3 (15 min):

  • Team lead + manager
  • Final escalation

Escalation Rules

SettingDescription
Wait TimeMinutes before escalating
NotifyWho to alert at this level
RepeatRepeat this level before escalating

Escalation Example

Alert triggered

Level 1: Alert on-call engineer
  ↓ (5 min, not acknowledged)
Level 2: Alert on-call + backup
  ↓ (10 min, not acknowledged)
Level 3: Alert entire team

Alerts

On-Call Alerts

How Alerts are Triggered

Alerts are created when:

  • Monitor fails (ENDPOINT component)
  • Manual incident created with on-call notification
  • API alert created

Alert States

StateDescription
TriggeredAlert created, not acknowledged
AcknowledgedSomeone is working on it
ResolvedIssue fixed
SnoozedTemporarily silenced

Acknowledging Alerts

From the dashboard:

  1. Click the alert
  2. Click "Acknowledge"

Via email/SMS:

  • Reply with "ack" or click the link

Via API:

curl -X POST http://localhost:3000/api/v1/oncall/alerts/{id}/acknowledge \
  -H "Authorization: Bearer sk_live_xxx"

Snoozing Alerts

Temporarily silence an alert:

  1. Open the alert
  2. Click "Snooze"
  3. Select duration (5m, 15m, 1h, etc.)

The alert will re-trigger after the snooze period if not resolved.

Resolving Alerts

Mark an alert as fixed:

  1. Open the alert
  2. Click "Resolve"

Or resolve automatically when:

  • Monitor recovers
  • Linked incident is resolved

Contact Methods

Configure how team members receive alerts:

Setting Up Contacts

  1. Navigate to Settings > On-Call Contacts
  2. Click "Add Contact"
  3. Configure methods:
MethodDescription
EmailEmail alerts
SMSText message alerts
PhoneVoice call (if configured)
PushMobile app notifications

Notification Order

Set the order for trying contact methods:

1. Push notification (immediate)
2. SMS (30 sec delay if no response)
3. Phone call (2 min delay if no response)

Quiet Hours

Define times when certain methods are disabled:

Quiet hours: 10pm - 8am
During quiet hours: Only phone calls

On-Call Contacts

Per-User Settings

Each user can configure:

  • Primary email/phone
  • Backup contact methods
  • Notification preferences
  • Quiet hours

Testing Contacts

  1. Open contact settings
  2. Click "Test" next to each method
  3. Verify receipt

Schedule Views

Calendar View

See schedule as a calendar:

  • Color-coded by person
  • Shows overrides
  • Click to edit

Timeline View

Linear timeline of coverage:

  • Who is on-call when
  • Gap detection
  • Coverage summary

List View

Table of upcoming shifts:

  • User, start, end
  • Sortable and searchable

API Access

Current On-Call

curl http://localhost:3000/api/v1/oncall/schedules/{id}/current \
  -H "Authorization: Bearer sk_live_xxx"

Create Alert

curl -X POST http://localhost:3000/api/v1/oncall/alerts \
  -H "Authorization: Bearer sk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "High CPU Usage",
    "description": "Server CPU at 95%",
    "scheduleId": "schedule-id",
    "severity": "critical"
  }'

List Alerts

curl http://localhost:3000/api/v1/oncall/alerts \
  -H "Authorization: Bearer sk_live_xxx"

Best Practices

Schedule Design

  • Ensure 24/7 coverage
  • Balance load fairly
  • Account for timezones
  • Plan for holidays

Escalation

  • Start with small team
  • Add management at later levels
  • Keep escalation times reasonable
  • Test escalation periodically

Alert Fatigue Prevention

  • Tune monitor thresholds
  • Use appropriate failure thresholds
  • Review and eliminate noisy alerts
  • Group related alerts

Documentation

  • Document escalation procedures
  • Create runbooks for common issues
  • Keep contact info updated
  • Regular schedule reviews

Troubleshooting

Alerts Not Triggering

  1. Check schedule has participants
  2. Verify escalation policy exists
  3. Confirm contact methods configured
  4. Check monitor auto-incident settings

Wrong Person Alerted

  1. Check current schedule
  2. Look for overrides
  3. Verify timezone settings
  4. Check rotation order

Alerts Not Escalating

  1. Verify escalation policy
  2. Check wait times
  3. Confirm next level has contacts
  4. Review escalation logs

Related Documentation