Round 1historical aggregate

28 fragmented executions between Apr 12, 2026 and Apr 19, 2026

corpus v3

View another round:

Round 1 is the aggregate of work done before the method was formalized (blind sub-agent, 4-profile × 3-language composition). Prompts were crafted over several weeks through partial executions with varying methods. Kept in the history as a reference — the per-prompt × provider results are the most recent available for each pair.

models

prompts

samples

600/600

errors

Avg Δ

+43.9

Round ranking

sorted by delta ↓

#	model	lab	before	after	Δ	latency	coverage
1	Jamba Large 1.7 (AI21) jamba-large-1.7	AI21 Labs (Israel)trial	44.5	94.3	+49.8	4.3s	50/50
2	Claude Opus (via CLI) claude-opus-4-7 (via CLI)	Anthropic (EUA)paid	44.5	94.3	+49.7	12.9s	50/50
3	Claude Sonnet (via CLI) claude-sonnet-4-6 (via CLI)	Anthropic (EUA)paid	44.5	93.6	+49.0	18.6s	50/50
4	GPT-5.4 (OpenAI) gpt-5.4	OpenAI (EUA)paid	44.5	91.0	+46.4	4.0s	50/50
5	Mistral Small mistral-small-latest	Mistral AI (França)free	44.5	89.5	+45.0	2.3s	50/50
6	Llama 3.3 70B (Groq) llama-3.3-70b-versatile	Meta (EUA) via Groqfree	44.5	89.4	+44.8	3.3s	50/50
7	DeepSeek R1 deepseek-reasoner	DeepSeek (China)free	44.5	89.3	+44.7	40.2s	50/50
8	DeepSeek V3 deepseek-chat	DeepSeek (China)free	44.5	89.2	+44.6	7.7s	50/50
9	Command A (Cohere) command-a-03-2025	Cohere (Canadá)trial	44.5	85.5	+41.0	10.1s	50/50
10	Gemini 2.5 Flash gemini-2.5-flash	Google (EUA)free	44.5	84.5	+40.0	8.9s	50/50
11	GPT-5 nano (OpenAI) gpt-5-nano	OpenAI (EUA)paid	44.5	81.6	+37.0	5.6s	50/50
12	GPT-4o mini (OpenAI) gpt-4o-mini	OpenAI (EUA)paid	44.5	78.7	+34.2	3.1s	50/50

Prompts used

Prompts tested in this round. Click each card to expand the prompt text and see each provider's response.

Errors in this round

Calls that failed — usually transient API instability or quota exhaustion. Recoverable via retry + merge-retry.

No errors in this round.