Round 1historical aggregate

28 fragmented executions between Apr 12, 2026 and Apr 19, 2026

corpus v3

Round 1 is the aggregate of work done before the method was formalized (blind sub-agent, 4-profile × 3-language composition). Prompts were crafted over several weeks through partial executions with varying methods. Kept in the history as a reference — the per-prompt × provider results are the most recent available for each pair.

models
12
prompts
50
samples
600/600
errors
0
Avg Δ
+43.9

Round ranking

sorted by delta ↓
#modellabbeforeafterΔlatencycoverage
1
Jamba Large 1.7 (AI21)
jamba-large-1.7
AI21 Labs (Israel)trial
44.594.3+49.84.3s50/50
2
Claude Opus (via CLI)
claude-opus-4-7 (via CLI)
Anthropic (EUA)paid
44.594.3+49.712.9s50/50
3
Claude Sonnet (via CLI)
claude-sonnet-4-6 (via CLI)
Anthropic (EUA)paid
44.593.6+49.018.6s50/50
4
GPT-5.4 (OpenAI)
gpt-5.4
OpenAI (EUA)paid
44.591.0+46.44.0s50/50
5
Mistral Small
mistral-small-latest
Mistral AI (França)free
44.589.5+45.02.3s50/50
6
Llama 3.3 70B (Groq)
llama-3.3-70b-versatile
Meta (EUA) via Groqfree
44.589.4+44.83.3s50/50
7
DeepSeek R1
deepseek-reasoner
DeepSeek (China)free
44.589.3+44.740.2s50/50
8
DeepSeek V3
deepseek-chat
DeepSeek (China)free
44.589.2+44.67.7s50/50
9
Command A (Cohere)
command-a-03-2025
Cohere (Canadá)trial
44.585.5+41.010.1s50/50
10
Gemini 2.5 Flash
gemini-2.5-flash
Google (EUA)free
44.584.5+40.08.9s50/50
11
GPT-5 nano (OpenAI)
gpt-5-nano
OpenAI (EUA)paid
44.581.6+37.05.6s50/50
12
GPT-4o mini (OpenAI)
gpt-4o-mini
OpenAI (EUA)paid
44.578.7+34.23.1s50/50

Prompts used

Prompts tested in this round. Click each card to expand the prompt text and see each provider's response.

Errors in this round

Calls that failed — usually transient API instability or quota exhaustion. Recoverable via retry + merge-retry.

No errors in this round.