corpus v99
| # | model | lab | before | after | Δ | latency | coverage |
|---|---|---|---|---|---|---|---|
| 1 | Grok 4.20 Reasoning (xAI) grok-4.20-reasoning | xAI (USA)data-sharing | 43.4 | 92.0 | +48.6 | 17.2s | 62/62 |
Prompts tested in this round. Click each card to expand the prompt text and see each provider's response.
Calls that failed — usually transient API instability or quota exhaustion. Recoverable via retry + merge-retry.
No errors in this round.