What AI Can Do Now

AI bug-fix rates plateau near 90%

AI just jumped from fixing about half of real code problems to almost all of them on one public test. The next question is whether the test is now too easy.

→ Plateauing72% confidencehorizon Dec 2027data as of June 2026

Confidence-weighted forecast · history → projection

Tracking real bugs solved (%)

ObservedProjectionshaded band = confidence cone

72%

Confidence

2027

Horizon

Jun 2026

First called

Tracking

Status

The call

It used to be a party trick. Now the machine quietly closes tickets a junior engineer would sweat over.

The number to notice

From 1-in-25 to most of them in roughly two years.

What’s driving this

The best AI coding systems now try fixes, run checks, and keep going until the code works.
The public test uses real problems from open-source software, so the jump still matters.
But the score is now so high that the test may be running out of room to show the next leap.

The track record

Sep 2025 · 65%
First called: rising, 65% confidence — the curve had clearly turned.
Mar 2026 · 74%
Raised to accelerating, 74% — gains widened faster than expected.
Jun 2026 · 80%
Held at 80% — tooling, not just models, now carries the climb.
Jun 2026 · 72%
Changed to plateauing — June 2026 trackers show the top system at 93.9%, near the ceiling of the public real-bug test.

Sources · check usOpen

SWE-bench Verified ↗SWE-bench · 2026dataset SWE-bench Verified Benchmark 2026 ↗BenchLM · 2026-06-02index SWE-bench Verified Leaderboard ↗Steel.dev · 2026-05-28index What's in a Benchmark? The Case of SWE-Bench in Automated Program Repair ↗arXiv · 2026-02-04study

For you

If you build anything, the boring half of your job is the part getting automated first.

What would change our mind

If newer private or harder tests show AI still fails many everyday software fixes, treat this as a test getting maxed out, not proof that AI can fix almost any real bug.

Behind the numbersOpen

Direction inferred from year-over-year results on a public benchmark of real GitHub issues (4.4% → 71.7%, 2024→2025). Projection assumes continued model + tooling gains; a plateau in frontier models would flatten it.