1. Uptime (50% weight) — was the service available?
The base rate. Uptime % = 1 − (critical_min + 0.3 × major_min) ÷ window_min. That's the same formula Atlassian Statuspage uses on Claude's, OpenAI's, and DeepSeek's own status pages — full weight for outages a user sees as “down”, a third of the weight for partial outages a user sees as “degraded”. Minors and scheduled maintenance don't count. Two simultaneous incidents get merged so they don't double-bill the same minute.
The factor itself isn't the raw percent though — it's the percent graded on a nines of reliability curve. 99.9% (three nines) and 99.99% (four nines) sound nearly identical in plain numbers but represent 10× less downtime per year (~53 min vs. ~5 min). A linear percent scale would squash that gap; the nines curve separates providers the way users actually feel them:
2. Frequency (20% weight) — how often does it break?
Same uptime can come from very different incident shapes. One four-hour outage and twenty 12-minute blips both burn ~0.5% of a month, but the second one feels much worse. The frequency factor counts incidents and decays exponentially: 100 × exp(−count ÷ 100) at the 30-day window. 100 incidents lands around 37/100; 200 lands near 14. The factor never reaches zero, so even chatty weeks differentiate.
3. Severity (20% weight) — how bad was each one?
Frequency treats every incident as one. Severity weights each by duration × impact and decays the sum exponentially. We use the provider's own classification verbatim:
- Minor ×0.5 — “elevated errors”, most users unaffected.
- Major ×2 — partial outage, real user impact.
- Critical ×6 — service effectively unavailable.
Each incident's contribution is duration-capped per impact tier so a stuck-open minor tag can't torpedo the score, but a multi-day critical actually feels catastrophic in the math:
4. Time to recover (10% weight) — how fast was the fix?
Same uptime, same incident count, same severity mix — but one provider resolves in 12 minutes and the other limps for 90. MTTR captures the gap. We use the median incident duration (not the mean) so one anomalous all-day outage doesn't dominate — that's already what severity is measuring. A 120-minute median zeroes the factor; 30 minutes lands at 75. Ongoing incidents count toward the median using their elapsed time, so an unresolved 7-hour outage doesn't get a free pass while we wait for it to close.
A provider spent 30 days at 99.9% uptime, had 15 incidents (mostly minor, one 3-hour major), and resolved the median incident in 45 minutes.
score = 0.5 × uptime_factor
+ 0.2 × frequency_factor
+ 0.2 × severity_factor
+ 0.1 × mttr_factor
uptime_factor = clamp(5, 100, sqrt((nines − 1) ÷ 3) × 100)
nines = −log10(1 − uptime% / 100)
(below 90% → floor 5; 4 nines → 100)
uptime% = 100 − (critical_min + 0.3 × major_min) / window_min × 100
(Statuspage formula: minors + maintenance excluded)
frequency_factor = 100 × exp(−incident_count ÷ e_fold)
e_fold[30d] = 100, e_fold[7d] = 40, e_fold[1h] = 6
severity_minutes = Σ min(duration, cap[impact]) × impact_mult
cap: minor 6h · major 12h · critical 48h
mult: minor ×0.5 · major ×2 · critical ×6
severity_factor = 100 × exp(−severity_minutes ÷ scale)
scale[30d] = 3000, scale[7d] = 1500
mttr_factor = max(0, 100 − median_resolve_minutes ÷ 1.2)- Ping provider APIs ourselves — if a provider's status page is wrong, so are we.
- User accounts or alerts.
- Latency or answer-quality benchmarks. Reliability only.