Round 3

finished at Apr 24, 2026

corpus v3

View another round:

models

prompts

samples

62/62

errors

Avg Δ

+47.3

Round ranking

sorted by delta ↓

#	model	lab	before	after	Δ	latency	coverage
1	GPT-5.5 (OpenAI) gpt-5.5	OpenAI (EUA)paid	43.4	90.8	+47.3	7.4s	62/62

Prompts tested in this round. Click each card to expand the prompt text and see each provider's response.

Calls that failed — usually transient API instability or quota exhaustion. Recoverable via retry + merge-retry.

No errors in this round.