Round 3

finished at Apr 24, 2026

corpus v3

models
1
prompts
62
samples
62/62
errors
0
Avg Δ
+47.3

Round ranking

sorted by delta ↓
#modellabbeforeafterΔlatencycoverage
1
GPT-5.5 (OpenAI)
gpt-5.5
OpenAI (EUA)paid
43.490.8+47.37.4s62/62

Prompts used

Prompts tested in this round. Click each card to expand the prompt text and see each provider's response.

Errors in this round

Calls that failed — usually transient API instability or quota exhaustion. Recoverable via retry + merge-retry.

No errors in this round.