april 2026 updateUpdated monthly

Agentleaderboard

A monthly-updated dashboard version of the overnight agents analysis, backed by persisted leaderboard snapshots. This view is the April 2026 update.

superlatives

april 2026

lower revert rate:

1.19 / 1k merged PRs

Codex

lowest code churn:

5.7% seven-day churn

Codex

fewest P0s generated:

0.038 / 10k merged LOC

Devin

fewest PR revisions:

2.11 cycles to merge

Devin

[ FIG. 01 / FULL AI AUTHORSHIP SHARE ]

% of opened PRs

Agent-written PR share

Monthly share of opened PRs with evidence of full end-to-end AI authorship.

30%

20%

10%

0.86%

27.6%

Feb2025

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan2026

Feb

Mar

Apr

month

monthly share

monthly opened PRs

[ FIG. 02 / REVERTS PER 1,000 MERGED PRs ]

author

Revert-rate leaderboard

Lower is better. The dashed line is the human baseline for the same merge window.

Codex

n=25,948

human

1.19

Claude

n=198,969

1.80

human

n=674,880

2.72

Cursor BG

n=7,335

3.41

Devin

n=11,131

3.50

01234

reverts per 1,000 merged PRs

below human

human baseline

above human

lower is better

[ FIG. 03 / MEDIAN LOC, SAME DEVELOPERS ]

author

PR size, same developers

Median lines changed for developers who opened both AI-authored and human-written PRs.

AI-assistedn=205,774

171

humann=210,600

143

050100150200

median LOC per PR

AI-assisted

human

same developers, April 2026

[ FIG. 04 / REVERT RATE BY PR SIZE ]

reverts per 1,000 merged PRs

Reverts by PR size

Same-developer revert rates grouped by number of files changed.

4.0

3.0

2.0

1.0

0.0

+12%

3.36

2.99

tied

3.02

2.99

-17%

2.38

2.88

-28%

2.57

3.57

1 file

2-3 files

4-10 files

10+ files

files changed

AI worse

AI better

human

same developers, April 2026

[ FIG. 05 / FILE CHURN BY AUTHOR AND PR SIZE ]

author

Seven-day file churn

Share of touched files that were edited again within a week, bucketed by PR size.

1 file

2-3 files

4-10 files

10+ files

mean

Codex

5.6%

6.0%

5.1%

5.9%

5.7%

Claude

8.2%

8.3%

7.7%

8.1%

Cursor BG

10.3%

6.7%

8.5%

9.6%

8.8%

human

9.9%

9.8%

10.5%

9.8%

10.0%

Devin

14.0%

13.6%

12.9%

13.3%

13.5%

files changed

below human mean

above human mean

human mean 10.0%

[ FIG. 06 / GREPTILE FLAGS PER 10K MERGED LOC ]

Review findings by severity

Greptile issue rates normalized by merged lines of code.

P0critical

Devin

0.038

Codex

0.041

Claude

0.078

human

0.099

Cursor BG

0.145

00.050.10.15

P1real bug

Devin

1.47

Codex

1.31

Claude

2.34

human

1.94

Cursor BG

2.83

0123

P2style / nit

Devin

2.64

Codex

2.81

Claude

3.75

human

2.88

Cursor BG

4.80

012345

human baseline

rates per 10k merged LOC

[ FIG. 07 / REVIEW CYCLES BY PR SIZE ]

mean review cycles

Review cycles by LOC

Mean number of review cycles to merge, grouped by pull-request size.

4.0

3.0

2.0

1.0

0.0

1.27

1.51

1.91

2.38

2.81

3.54

< 10

10-49

50-199

200-499

500-999

1000+

LOC in PR

cross-population, all authors

[ FIG. 08 / MEAN REVIEW CYCLES BY AUTHOR ]

author

Cycles by author

A tight axis makes the spread in review cycles visible without exaggerating the table values.

Devinn=6,159

2.11

Clauden=156,219

2.19

humann=433,854

2.21

Cursor BGn=5,691

2.46

Codexn=20,455

2.46

2.02.32.6

mean review cycles to merge (axis 2.0-2.6)

agents

human

April 2026

[ FIG. 09 / FAILURE-PATTERN HEATMAP ]

Failure fingerprints

Agent issue rates divided by the human rate, normalized per LOC. Values above 1.0x mean that class of issue appears more often than in human PRs.

Claude

Codex

Devin

Cursor BG

security

sql injection

1.50x

1.25x

0.70x

1.70x

xss

1.57x

0.86x

1.43x

auth bypass

1.50x

1.00x

0.50x

1.67x

IDOR / missing tenant check

1.75x

0.88x

0.69x

1.31x

secret in logs

1.34x

0.94x

1.65x

correctness

n+1 query

1.27x

0.64x

0.45x

3.45x

regression / breaks existing

1.25x

1.34x

0.89x

2.37x

off-by-one

1.64x

0.55x

0.64x

2.27x

timezone / date bug

1.48x

0.90x

0.66x

2.09x

env var / config bug

1.45x

1.35x

0.95x

housekeeping

test missing

0.96x

1.13x

0.93x

2.37x

dead code

1.14x

0.99x

0.78x

2.05x

stale comment / wrong doc

1.69x

0.38x

0.88x

0.69x

agent

below human rate

above human rate

1.0x = human rate

companion analysis

This leaderboard is the monthly refreshed companion to the overnight agents analysis.

read the post