corpus v3
| # | model | lab | before | after | Δ | latency | coverage |
|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 (OpenAI) gpt-5.5 | OpenAI (EUA)paid | 43.4 | 90.8 | +47.3 | 7.4s | 62/62 |
Prompts tested in this round. Click each card to expand the prompt text and see each provider's response.
Calls that failed — usually transient API instability or quota exhaustion. Recoverable via retry + merge-retry.
No errors in this round.