Your First Incident

Learn how to create, update, and resolve incidents in ReliaPulse.

Overview

This tutorial covers the complete incident lifecycle:

Creating an incident
Adding updates
Resolving the incident
Writing a postmortem

The Incident Lifecycle

   ┌─────────────────────────────────────────────────────────────┐
   │                                                             │
   │  ┌───────────┐   ┌────────────┐   ┌───────────┐   ┌────────┤
   │  │Investigating│ → │ Identified │ → │ Monitoring│ → │Resolved│
   │  └───────────┘   └────────────┘   └───────────┘   └────────┤
   │                                                             │
   │     Report         Root cause       Fix applied      Issue   │
   │     issue          found            watching         fixed   │
   │                                                             │
   └─────────────────────────────────────────────────────────────┘

Create an Incident

When you discover an issue:

Navigate to Dashboard > Incidents
Click "New Incident"
Fill in the details:

Basic Information:
- Title: API Response Times Elevated
- Status: Investigating
- Impact: Choose the severity level
Affected Components:
- Select API (or your component)
- Set component status to Degraded Performance
Initial Message:
```
We are investigating reports of slow API response times.
Some users may experience delays when making requests.
We will provide updates as we learn more.
```
Click "Create Incident"

The incident immediately appears on your public status page, and subscribers receive notifications.

Add an Update (Identified)

Once you've found the root cause:

Open the incident from the incidents list
Click "Add Update"

Fill in the update:

Status: Identified
Message:

We have identified the root cause as a database connection pool
exhaustion. Our team is working on increasing the pool size
and implementing additional connection management.

Optionally update component status (keep as Degraded Performance)
Click "Post Update"

Add an Update (Monitoring)

After applying a fix:

Click "Add Update" again

Fill in the update:

Status: Monitoring
Message:

A fix has been deployed to increase database connection pool
capacity. Response times are returning to normal levels.
We are monitoring the system to ensure stability.

Click "Post Update"

Resolve the Incident

Once the issue is fully resolved:

Click "Add Update"

Fill in the resolution:

Status: Resolved
Message:

This incident has been resolved. API response times have
returned to normal levels and have been stable for the
past 30 minutes.

We apologize for any inconvenience caused.

Important: Update component status back to Operational
Click "Post Update"

The incident is now marked as resolved and moves to the incident history.

Write a Postmortem

For significant incidents, add a postmortem:

Open the resolved incident
Click "Add Postmortem"

Write a thorough analysis:

Summary:

On [date], users experienced elevated API response times for
approximately 45 minutes due to database connection pool exhaustion.

Impact:

- Duration: 45 minutes
- Users affected: ~15% of API requests
- Services impacted: API, Web Application

Root Cause:

A recent deployment increased concurrent request handling without
proportionally increasing the database connection pool size.
During peak traffic, connections were exhausted, causing requests
to queue and timeout.

Timeline:

14:23 - Monitoring alerts for elevated response times
14:25 - Engineering notified, investigation begins
14:35 - Root cause identified as connection pool exhaustion
14:45 - Pool size increase deployed to production
14:55 - Response times normalized
15:08 - Incident resolved after stability monitoring

Action Items:

- [ ] Add connection pool metrics to monitoring dashboard
- [ ] Create deployment checklist for resource requirements
- [ ] Implement connection pool auto-scaling

Toggle "Publish Postmortem" to show on status page
Click "Save"

Best Practices

Communication Style

Do:

Be clear and concise
Use simple language, avoid jargon
Provide estimated times when possible
Update frequently during active incidents

Don't:

Make promises you can't keep
Blame individuals or teams
Use overly technical language
Leave users without updates for long periods

Update Frequency

Incident Phase	Update Frequency
Investigating	Every 15-20 minutes
Identified	Every 20-30 minutes
Monitoring	Every 30-60 minutes
Resolved	Final update only

Incident Templates

Use templates for consistent messaging:

Navigate to Settings > Templates
Create templates for common incident types:
- Network issues
- Database problems
- Third-party outages
- Planned maintenance

Templates save time during high-pressure situations and ensure consistent communication.

Automatic Incidents

ENDPOINT components can automatically create incidents when health checks fail:

Edit your ENDPOINT component
Enable "Auto Create Incident"
Set the failure threshold (e.g., 3 consecutive failures)
Configure auto-resolve behavior

When the monitor detects failures:

An incident is created with Investigating status
Affected component is set to Major Outage
When recovered, incident is resolved automatically

Next Steps

Learn about monitors - Automate incident creation
Set up notifications - Alert your team
Configure on-call - Escalate to the right people

Your First Status Page Overview