Single LLM calls solve simple problems. Multi-agent workflows solve complex ones — by decomposing tasks, running agents in parallel, specializing each agent for a subtask, and having agents check each other’s work.
In 2025, multi-agent workflows became practical for production use. The tooling matured (Claude’s tool use, MCP, extended context), the failure modes are now well-understood, and the cost curve makes multi-step workflows economically viable for a wider range of applications.
This post covers the architecture patterns I’ve found most useful, with complete implementations.
Table of contents
Open Table of contents
When Multi-Agent Beats Single Agent
Use single agent when:
- The task fits in one context window
- The task is sequential with no parallelism opportunities
- Latency is critical (each additional agent adds ~2-30s)
Use multi-agent when:
- Subtasks can execute in parallel (10x throughput improvement)
- Different subtasks need different specializations or context
- You want independent verification (agent A produces, agent B reviews)
- The total work exceeds a single context window
- You need to process many independent items (fan-out pattern)
Pattern 1: Parallel Fan-Out
Process N independent items simultaneously:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
interface TransactionSummary { transaction_id: string; risk_assessment: string; risk_level: 'low' | 'medium' | 'high'; flags: string[];}
async function analyzeTransaction( transaction: Record<string, unknown>): Promise<TransactionSummary> { const response = await client.messages.create({ model: 'claude-haiku-4-5', // Fast + cheap for individual items max_tokens: 512, system: `You are a financial fraud analyst. Analyze transactions for risk. Always respond with valid JSON only.`, messages: [ { role: 'user', content: `Analyze this transaction for fraud risk:${JSON.stringify(transaction, null, 2)}
Respond with JSON: { "risk_assessment": "...", "risk_level": "low|medium|high", "flags": [] }`, }, ], });
const text = response.content[0].type === 'text' ? response.content[0].text : '{}'; const parsed = JSON.parse(text);
return { transaction_id: transaction.id as string, ...parsed, };}
// Fan-out: process all transactions in parallelasync function analyzeBatch( transactions: Record<string, unknown>[], concurrency = 10 // Respect rate limits): Promise<TransactionSummary[]> { // Process in chunks to avoid rate limits const results: TransactionSummary[] = [];
for (let i = 0; i < transactions.length; i += concurrency) { const chunk = transactions.slice(i, i + concurrency); const chunkResults = await Promise.all( chunk.map(t => analyzeTransaction(t)) ); results.push(...chunkResults); }
return results;}Key decision: Use claude-haiku-4-5 for individual items (~15x cheaper, significantly faster than Opus) and claude-opus-4-6 for the final synthesis. Right-size each agent for its task.
Pattern 2: Hierarchical Orchestration
An orchestrator agent decomposes the task, spawns worker agents, collects and synthesizes results:
interface SubTask { id: string; description: string; context: string;}
interface SubTaskResult { task_id: string; result: string; confidence: number;}
// Worker agent: executes a specific subtaskasync function executeSubTask(task: SubTask): Promise<SubTaskResult> { const response = await client.messages.create({ model: 'claude-sonnet-4-6', max_tokens: 2048, messages: [ { role: 'user', content: `Task: ${task.description}
Context:${task.context}
Complete this task thoroughly. At the end, rate your confidence (0-1) in your answer.`, }, ], });
const text = response.content[0].type === 'text' ? response.content[0].text : '';
// Extract confidence score from response const confidenceMatch = text.match(/confidence[:\s]+([0-9.]+)/i); const confidence = confidenceMatch ? parseFloat(confidenceMatch[1]) : 0.8;
return { task_id: task.id, result: text, confidence, };}
// Orchestrator: decomposes task and synthesizes resultsasync function runOrchestrated( mainTask: string, context: string): Promise<string> {
// Step 1: Orchestrator decomposes the task const decompositionResponse = await client.messages.create({ model: 'claude-opus-4-6', max_tokens: 1024, messages: [ { role: 'user', content: `You are a task orchestrator. Decompose this complex task into 3-5 independent subtasks that can be executed in parallel.
Main task: ${mainTask}
Context: ${context}
Respond with JSON: { "subtasks": [{ "id": "1", "description": "...", "context": "..." }] }`, }, ], });
const decompositionText = decompositionResponse.content[0].type === 'text' ? decompositionResponse.content[0].text : '{"subtasks": []}';
const { subtasks } = JSON.parse(decompositionText) as { subtasks: SubTask[] };
// Step 2: Execute all subtasks in parallel console.log(`Executing ${subtasks.length} subtasks in parallel...`); const subResults = await Promise.all( subtasks.map(task => executeSubTask(task)) );
// Step 3: Orchestrator synthesizes results const synthesisResponse = await client.messages.create({ model: 'claude-opus-4-6', max_tokens: 4096, messages: [ { role: 'user', content: `You are a synthesizer. Combine these parallel research results into a coherent final answer.
Original task: ${mainTask}
Subtask results:${subResults.map(r => `Task ${r.task_id} (confidence: ${r.confidence}):${r.result}`).join('\n---\n')}
Synthesize a comprehensive final answer, weighing results by confidence.`, }, ], });
return synthesisResponse.content[0].type === 'text' ? synthesisResponse.content[0].text : 'Synthesis failed';}Pattern 3: Checker / Verifier Agent
Agent A produces. Agent B independently verifies. Especially useful for code generation and financial calculations where errors are costly:
interface GeneratedCode { code: string; language: string; description: string;}
interface VerificationResult { passed: boolean; issues: string[]; suggested_fixes: string[]; confidence: number;}
// Generator agent: produces codeasync function generateCode(requirement: string): Promise<GeneratedCode> { const response = await client.messages.create({ model: 'claude-sonnet-4-6', max_tokens: 2048, system: 'You are an expert software engineer. Generate clean, production-ready code.', messages: [ { role: 'user', content: `Generate TypeScript code for: ${requirement}
Include error handling, type annotations, and brief inline comments.`, }, ], });
const text = response.content[0].type === 'text' ? response.content[0].text : '';
// Extract code block const codeMatch = text.match(/```(?:typescript)?\n([\s\S]*?)```/);
return { code: codeMatch?.[1] ?? text, language: 'typescript', description: requirement, };}
// Verifier agent: checks the generated codeasync function verifyCode( generated: GeneratedCode, requirement: string): Promise<VerificationResult> { const response = await client.messages.create({ model: 'claude-opus-4-6', // Use the best model for critical verification max_tokens: 2048, system: `You are a strict code reviewer. Your job is to find bugs, security issues, and logic errors. Be thorough. Do not approve code unless you are certain it's correct.`, messages: [ { role: 'user', content: `Review this ${generated.language} code for the requirement: "${requirement}"
Code:\`\`\`${generated.language}${generated.code}\`\`\`
Check for:1. Logic errors or off-by-one errors2. Security vulnerabilities3. Missing error handling4. Type safety issues5. Whether it actually fulfills the requirement
Respond with JSON:{ "passed": boolean, "issues": ["list of issues found"], "suggested_fixes": ["specific fix suggestions"], "confidence": 0-1}`, }, ], });
const text = response.content[0].type === 'text' ? response.content[0].text : '{}'; const jsonMatch = text.match(/\{[\s\S]*\}/);
if (!jsonMatch) { return { passed: false, issues: ['Verification failed to parse'], suggested_fixes: [], confidence: 0 }; }
return JSON.parse(jsonMatch[0]) as VerificationResult;}
// Generate → Verify → Regenerate loopasync function generateVerifiedCode( requirement: string, maxAttempts = 3): Promise<{ code: string; verified: boolean }> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) { console.log(`Attempt ${attempt}/${maxAttempts}`);
const generated = await generateCode(requirement); const verification = await verifyCode(generated, requirement);
if (verification.passed && verification.confidence > 0.8) { console.log(`✓ Verification passed (confidence: ${verification.confidence})`); return { code: generated.code, verified: true }; }
console.log(`✗ Issues found: ${verification.issues.join(', ')}`);
if (attempt < maxAttempts) { // Feed verification feedback back into next generation requirement = `${requirement}
Previous attempt had these issues:${verification.issues.join('\n')}
Fix suggestions:${verification.suggested_fixes.join('\n')}`; } }
return { code: 'Generation failed after max attempts', verified: false };}Pattern 4: Agent-to-Agent Communication via Shared Context
For complex workflows where agents need to build on each other’s work sequentially:
interface AgentContext { task: string; history: Array<{ agent: string; output: string; timestamp: Date }>; state: Record<string, unknown>;}
class SequentialAgentPipeline { private context: AgentContext; private agents: Array<{ name: string; role: string; prompt: string }>;
constructor(task: string, agents: Array<{ name: string; role: string; prompt: string }>) { this.context = { task, history: [], state: {} }; this.agents = agents; }
async run(): Promise<AgentContext> { for (const agent of this.agents) { console.log(`Running agent: ${agent.name}`);
const historyText = this.context.history.map(h => `[${h.agent}]: ${h.output}` ).join('\n\n');
const response = await client.messages.create({ model: 'claude-sonnet-4-6', max_tokens: 2048, system: agent.prompt, messages: [ { role: 'user', content: `Original task: ${this.context.task}
Previous agent outputs:${historyText || '(none — you are the first agent)'}
Current state: ${JSON.stringify(this.context.state, null, 2)}
Your role: ${agent.role}Please complete your part of the task, building on previous outputs.`, }, ], });
const output = response.content[0].type === 'text' ? response.content[0].text : '';
this.context.history.push({ agent: agent.name, output, timestamp: new Date(), }); }
return this.context; }}
// Example: Financial report generation pipelineconst reportPipeline = new SequentialAgentPipeline( 'Generate Q3 2025 financial performance summary for stakeholders', [ { name: 'DataAnalyst', role: 'Analyze raw financial data and extract key metrics', prompt: 'You are a financial data analyst. Extract and compute key metrics from financial data.', }, { name: 'Interpreter', role: 'Interpret the metrics and identify trends and insights', prompt: 'You are a financial interpreter. Turn raw metrics into meaningful business insights.', }, { name: 'Writer', role: 'Write a clear, professional summary for executive stakeholders', prompt: 'You are a financial writer. Create clear, concise executive summaries.', }, { name: 'Reviewer', role: 'Review the summary for accuracy, clarity, and completeness', prompt: 'You are a senior editor. Ensure financial reports are accurate and appropriately scoped.', }, ]);Error Recovery and Resilience
Multi-agent workflows fail more often than single calls — more surface area for errors:
class ResilientAgent { private retries: number; private fallbackModel: string;
constructor( private primaryModel: string = 'claude-opus-4-6', fallbackModel = 'claude-sonnet-4-6', retries = 3 ) { this.fallbackModel = fallbackModel; this.retries = retries; }
async call( params: Omit<Anthropic.MessageCreateParams, 'model'> ): Promise<Anthropic.Message> { let lastError: Error | undefined; let model = this.primaryModel;
for (let attempt = 1; attempt <= this.retries; attempt++) { try { return await client.messages.create({ ...params, model }); } catch (err) { lastError = err as Error;
if (err instanceof Anthropic.RateLimitError) { const waitMs = Math.min(1000 * Math.pow(2, attempt), 60_000); console.warn(`Rate limited. Waiting ${waitMs}ms...`); await sleep(waitMs);
} else if (err instanceof Anthropic.APIError && err.status === 529) { // Overloaded — try fallback model if (model === this.primaryModel) { console.warn(`Primary model overloaded, trying fallback: ${this.fallbackModel}`); model = this.fallbackModel; } else { await sleep(30_000); }
} else { throw err; // Non-retryable } } }
throw lastError; }}
const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));Cost Estimation for Multi-Agent Workflows
Before deploying, estimate costs at scale:
// Rough cost estimator (check current pricing at anthropic.com)// Check current pricing at anthropic.com/pricing — these change frequentlyconst PRICING = { 'claude-opus-4-6': { input: 15.0, output: 75.0 }, // per million tokens 'claude-sonnet-4-6': { input: 3.0, output: 15.0 }, 'claude-haiku-4-5': { input: 1.0, output: 5.0 },};
function estimateWorkflowCost( steps: Array<{ model: keyof typeof PRICING; estimatedInputTokens: number; estimatedOutputTokens: number; parallelism?: number; }>): { costPerRun: number; costPer1000Runs: number } {
const costPerRun = steps.reduce((total, step) => { const pricing = PRICING[step.model]; const parallelism = step.parallelism ?? 1;
const stepCost = ( (step.estimatedInputTokens / 1_000_000) * pricing.input + (step.estimatedOutputTokens / 1_000_000) * pricing.output ) * parallelism;
return total + stepCost; }, 0);
return { costPerRun, costPer1000Runs: costPerRun * 1000, };}
// Example: fraud analysis workflowconst estimate = estimateWorkflowCost([ // Orchestrator (runs once) { model: 'claude-opus-4-6', estimatedInputTokens: 1000, estimatedOutputTokens: 500 }, // Workers (run 10 in parallel) { model: 'claude-haiku-4-5', estimatedInputTokens: 500, estimatedOutputTokens: 200, parallelism: 10 }, // Synthesizer (runs once) { model: 'claude-opus-4-6', estimatedInputTokens: 5000, estimatedOutputTokens: 1000 },]);
console.log(`Cost per run: $${estimate.costPerRun.toFixed(4)}`);console.log(`Cost per 1000 runs: $${estimate.costPer1000Runs.toFixed(2)}`);Multi-agent workflows are not magic — they’re engineering. The patterns above provide the building blocks. The real work is matching the architecture to the problem: right-sizing models, identifying genuine parallelism, and building recovery logic for the failure modes that emerge at scale.
Related posts
- LLM API Integration Patterns — Structured Outputs, Function Calling, Streaming — the foundational single-call patterns that multi-agent orchestration builds on
- Building MCP Servers — The New API Layer for AI Agents — how MCP servers expose tools that agents in these workflows can discover and call