BrokenAI

An independent reliability ledger for the major model providers. Built because their status pages weren't enough.

Pages
ScoreboardAbout & scoringSLA compliance30-day report
Providers
ClaudeOpenAIMistralGrokDeepSeekKimiCohere
Integrate
Status badgellms.txtSitemaprobots.txt
© 2026 BrokenAI · Independent · Not affiliated with any provider · See a number that looks wrong? @threatner_v2.1.0
BrokenAI
30-day reportSLAEmbedAbout
Scoring methodology

How the score works.

Every provider gets a single number from 0 to 100 for the window you pick. Higher is more reliable. The score is fully derived from each provider's own Statuspage feed — we don't run probes.

What the score feels like
Range
Label
Means
90–100
Excellent
Near-perfect reliability. Brief, infrequent blips at most.
75–90
Good
Normal for a busy production service. Occasional minor incidents.
50–75
Noticeable
Regular disruption. Real user impact on a weekly cadence.
25–50
Rough
Multiple serious incidents or a sustained major outage.
0–25
Broken
Effectively unreliable. Day-long criticals or a flood of failures.
The four things we measure
Reliability is a small handful of distinct questions: was the service up, how often did it break, how bad was each break, and how fast did they fix it? The score blends an answer to each on a 0–100 scale, weighted into one composite number.

1. Uptime (50% weight) — was the service available?

The base rate. Uptime % = 1 − (critical_min + 0.3 × major_min) ÷ window_min. That's the same formula Atlassian Statuspage uses on Claude's, OpenAI's, and DeepSeek's own status pages — full weight for outages a user sees as “down”, a third of the weight for partial outages a user sees as “degraded”. Minors and scheduled maintenance don't count. Two simultaneous incidents get merged so they don't double-bill the same minute.

The factor itself isn't the raw percent though — it's the percent graded on a nines of reliability curve. 99.9% (three nines) and 99.99% (four nines) sound nearly identical in plain numbers but represent 10× less downtime per year (~53 min vs. ~5 min). A linear percent scale would squash that gap; the nines curve separates providers the way users actually feel them:

Uptime
Nines
Factor
99.99%+
four nines
100
99.9%
three nines
82
99.5%
~2.3 nines
66
99.0%
two nines
58
95.0%
~1.3 nines
32
< 90.0%
< 1 nine
5 (floor)

2. Frequency (20% weight) — how often does it break?

Same uptime can come from very different incident shapes. One four-hour outage and twenty 12-minute blips both burn ~0.5% of a month, but the second one feels much worse. The frequency factor counts incidents and decays exponentially: 100 × exp(−count ÷ 100) at the 30-day window. 100 incidents lands around 37/100; 200 lands near 14. The factor never reaches zero, so even chatty weeks differentiate.

3. Severity (20% weight) — how bad was each one?

Frequency treats every incident as one. Severity weights each by duration × impact and decays the sum exponentially. We use the provider's own classification verbatim:

  • Minor ×0.5 — “elevated errors”, most users unaffected.
  • Major ×2 — partial outage, real user impact.
  • Critical ×6 — service effectively unavailable.

Each incident's contribution is duration-capped per impact tier so a stuck-open minor tag can't torpedo the score, but a multi-day critical actually feels catastrophic in the math:

Impact
Cap
Why
Minor
6h
Often left open after the problem is actually gone.
Major
12h
Usually fixed within a business day.
Critical
48h
A 2-day critical should look catastrophic in the score — and now does.

4. Time to recover (10% weight) — how fast was the fix?

Same uptime, same incident count, same severity mix — but one provider resolves in 12 minutes and the other limps for 90. MTTR captures the gap. We use the median incident duration (not the mean) so one anomalous all-day outage doesn't dominate — that's already what severity is measuring. A 120-minute median zeroes the factor; 30 minutes lands at 75. Ongoing incidents count toward the median using their elapsed time, so an unresolved 7-hour outage doesn't get a free pass while we wait for it to close.

Worked example

A provider spent 30 days at 99.9% uptime, had 15 incidents (mostly minor, one 3-hour major), and resolved the median incident in 45 minutes.

Factor
Calc
Score
× weight
Uptime (nines)
99.9% = 3 nines
67
= 33.5
Frequency
exp(−15/100)
86
= 17.2
Severity
one 3h major ≈ 360 sev-min
89
= 17.8
Time to recover
45-min median
63
= 6.3
Composite
74.8 / 100
The formula
score = 0.5 × uptime_factor
      + 0.2 × frequency_factor
      + 0.2 × severity_factor
      + 0.1 × mttr_factor

uptime_factor    = clamp(5, 100, sqrt((nines − 1) ÷ 3) × 100)
                   nines = −log10(1 − uptime% / 100)
                   (below 90% → floor 5; 4 nines → 100)

uptime%          = 100 − (critical_min + 0.3 × major_min) / window_min × 100
                   (Statuspage formula: minors + maintenance excluded)

frequency_factor = 100 × exp(−incident_count ÷ e_fold)
                   e_fold[30d] = 100, e_fold[7d] = 40, e_fold[1h] = 6

severity_minutes = Σ min(duration, cap[impact]) × impact_mult
                   cap: minor 6h · major 12h · critical 48h
                   mult: minor ×0.5 · major ×2 · critical ×6

severity_factor  = 100 × exp(−severity_minutes ÷ scale)
                   scale[30d] = 3000, scale[7d] = 1500

mttr_factor      = max(0, 100 − median_resolve_minutes ÷ 1.2)
Severity is per-provider
We read each provider's severity classification verbatim. Never escalate or de-escalate.
Provider
Source severity tier
Mapped to
Claude
Statuspage minor
minor
Claude
Statuspage major
major
Claude
Statuspage critical
critical
OpenAI
Statuspage minor
minor
OpenAI
Statuspage major
major
OpenAI
Statuspage critical
critical
Mistral
MINOR
minor
Mistral
MEDIUM (degraded)
minor
Mistral
MAJOR
major
Mistral
CRITICAL
critical
Grok (xAI)
(no field — title keywords)
inferred
DeepSeek
Statuspage minor
minor
DeepSeek
Statuspage major
major
DeepSeek
Statuspage critical
critical
Kimi (Moonshot)
Statuspage critical (only tier the operator uses)
critical
Cohere
incident.io minor
minor
Cohere
incident.io major
major
Cohere
incident.io critical
critical
Data sources
One feed per provider, fetched on page visit (cached for 60s). No background crawler, no independent probing.
Provider
Endpoint
Format
ClaudeAnthropic
https://status.claude.com/api/v2/summary.json
Statuspage v2 JSON
OpenAIOpenAI
https://status.openai.com/api/v2/summary.json
Statuspage v2 JSON
MistralMistral AI
https://status.mistral.ai/_payload.json
Checkly SSR JSON
GrokxAI
https://status.x.ai/feed.xml
Instatus RSS
DeepSeekDeepSeek
https://status.deepseek.com/api/v2/summary.json
Statuspage v2 JSON
KimiMoonshot AI
https://status.moonshot.cn/api/v2/summary.json
Statuspage v2 JSON
CohereCohere
https://status.cohere.com/api/v2/summary.json
incident.io (Statuspage-compatible)
What we don't do
  • Ping provider APIs ourselves — if a provider's status page is wrong, so are we.
  • User accounts or alerts.
  • Latency or answer-quality benchmarks. Reliability only.