Glossary

Monitoring & reliability glossary.

Plain-English definitions of the terms behind uptime, SLAs, and incident response.

Downtime
Downtime is any period during which a service is unavailable or not responding correctly to users.
Error budget
An error budget is the amount of unreliability an SLO allows — the gap between your target and 100%.
False positive (monitoring)
A false positive is an alert for an outage that isn't real — often a transient network blip from one location.
Heartbeat monitoring
Heartbeat monitoring expects a scheduled job to check in, and alerts you when the expected ping doesn't arrive.
MTBF (Mean Time Between Failures)
MTBF is the average amount of uptime between incidents — a measure of how often things break.
MTTD (Mean Time To Detect)
MTTD is the average time from when an incident begins to when your monitoring notices it.
MTTR (Mean Time To Recovery)
MTTR is the average time from detecting an incident to restoring service — the headline reliability metric.
On-call
On-call is a rotation that designates who is responsible for responding to incidents at any given time.
SLA (Service Level Agreement)
An SLA is a contractual promise about service reliability, usually with penalties if the target is missed.
SLI (Service Level Indicator)
An SLI is a measurement of some aspect of service quality, such as the percentage of successful requests.
SLO (Service Level Objective)
An SLO is the internal target a team holds a service to, such as 99.9% availability over 30 days.
Status page
A status page is a public page that shows a service's current operational status and incident history.
Synthetic monitoring
Synthetic monitoring simulates user actions — like a login or checkout flow — to test multi-step journeys.
Three nines (99.9%)
“Three nines” means 99.9% uptime, which allows about 8 hours 45 minutes of downtime per year.
Uptime
Uptime is the percentage of time a service is available and responding correctly over a given period.

Start monitoring in under a minute

Free forever, no credit card. Upgrade when you need finer intervals.

Start free See pricing

Monitoring & reliability glossary.

Downtime

Error budget

False positive (monitoring)

Heartbeat monitoring

MTBF (Mean Time Between Failures)

MTTD (Mean Time To Detect)

MTTR (Mean Time To Recovery)

On-call

SLA (Service Level Agreement)

SLI (Service Level Indicator)

SLO (Service Level Objective)

Status page

Synthetic monitoring

Three nines (99.9%)

Uptime

Start monitoring in under a minute