april 2026 updateUpdated monthly

Agentleaderboard

A monthly-updated dashboard version of the overnight agents analysis, backed by persisted leaderboard snapshots. This view is the April 2026 update.

superlatives
april 2026
lower revert rate:
1.19 / 1k merged PRs
Codex
lowest code churn:
5.7% seven-day churn
Codex
fewest P0s generated:
0.038 / 10k merged LOC
Devin
fewest PR revisions:
2.11 cycles to merge
Devin
[ FIG. 01 / FULL AI AUTHORSHIP SHARE ]

Agent-written PR share

Monthly share of opened PRs with evidence of full end-to-end AI authorship.

0.86%
27.6%
Feb2025
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan2026
Feb
Mar
Apr
month
monthly share
monthly opened PRs
[ FIG. 02 / REVERTS PER 1,000 MERGED PRs ]

Revert-rate leaderboard

Lower is better. The dashed line is the human baseline for the same merge window.

Codex
n=25,948
human
1.19
Claude
n=198,969
1.80
human
n=674,880
2.72
Cursor BG
n=7,335
3.41
Devin
n=11,131
3.50
01234
reverts per 1,000 merged PRs
below human
human baseline
above human
lower is better
[ FIG. 03 / MEDIAN LOC, SAME DEVELOPERS ]

PR size, same developers

Median lines changed for developers who opened both AI-authored and human-written PRs.

AI-assistedn=205,774
171
humann=210,600
143
050100150200
median LOC per PR
AI-assisted
human
same developers, April 2026
[ FIG. 04 / REVERT RATE BY PR SIZE ]

Reverts by PR size

Same-developer revert rates grouped by number of files changed.

+12%
3.36
2.99
tied
3.02
2.99
-17%
2.38
2.88
-28%
2.57
3.57
1 file
2-3 files
4-10 files
10+ files
files changed
AI worse
AI better
human
same developers, April 2026
[ FIG. 05 / FILE CHURN BY AUTHOR AND PR SIZE ]

Seven-day file churn

Share of touched files that were edited again within a week, bucketed by PR size.

1 file
2-3 files
4-10 files
10+ files
mean
Codex
5.6%
6.0%
5.1%
5.9%
5.7%
Claude
8.2%
8.2%
8.3%
7.7%
8.1%
Cursor BG
10.3%
6.7%
8.5%
9.6%
8.8%
human
9.9%
9.8%
10.5%
9.8%
10.0%
Devin
14.0%
13.6%
12.9%
13.3%
13.5%
files changed
below human mean
above human mean
human mean 10.0%
[ FIG. 06 / GREPTILE FLAGS PER 10K MERGED LOC ]

Review findings by severity

Greptile issue rates normalized by merged lines of code.

P0critical
Devin
0.038
Codex
0.041
Claude
0.078
human
0.099
Cursor BG
0.145
00.050.10.15
P1real bug
Devin
1.47
Codex
1.31
Claude
2.34
human
1.94
Cursor BG
2.83
0123
P2style / nit
Devin
2.64
Codex
2.81
Claude
3.75
human
2.88
Cursor BG
4.80
012345
human baseline
rates per 10k merged LOC
[ FIG. 07 / REVIEW CYCLES BY PR SIZE ]

Review cycles by LOC

Mean number of review cycles to merge, grouped by pull-request size.

1.27
1.51
1.91
2.38
2.81
3.54
< 10
10-49
50-199
200-499
500-999
1000+
LOC in PR
cross-population, all authors
[ FIG. 08 / MEAN REVIEW CYCLES BY AUTHOR ]

Cycles by author

A tight axis makes the spread in review cycles visible without exaggerating the table values.

Devinn=6,159
2.11
Clauden=156,219
2.19
humann=433,854
2.21
Cursor BGn=5,691
2.46
Codexn=20,455
2.46
2.02.32.6
mean review cycles to merge (axis 2.0-2.6)
agents
human
April 2026
[ FIG. 09 / FAILURE-PATTERN HEATMAP ]

Failure fingerprints

Agent issue rates divided by the human rate, normalized per LOC. Values above 1.0x mean that class of issue appears more often than in human PRs.

Claude
Codex
Devin
Cursor BG
security
sql injection
1.50x
1.25x
0.70x
1.70x
xss
1.57x
0.86x
0.86x
1.43x
auth bypass
1.50x
1.00x
0.50x
1.67x
IDOR / missing tenant check
1.75x
0.88x
0.69x
1.31x
secret in logs
1.34x
1.34x
0.94x
1.65x
correctness
n+1 query
1.27x
0.64x
0.45x
3.45x
regression / breaks existing
1.25x
1.34x
0.89x
2.37x
off-by-one
1.64x
0.55x
0.64x
2.27x
timezone / date bug
1.48x
0.90x
0.66x
2.09x
env var / config bug
1.45x
1.35x
1.35x
0.95x
housekeeping
test missing
0.96x
1.13x
0.93x
2.37x
dead code
1.14x
0.99x
0.78x
2.05x
stale comment / wrong doc
1.69x
0.38x
0.88x
0.69x
agent
below human rate
above human rate
1.0x = human rate
companion analysis

This leaderboard is the monthly refreshed companion to the overnight agents analysis.

read the post