Skip to content

Multi-Agent Workflows with Claude API — Architecture Patterns That Work

Posted on:December 8, 2025 at 10:00 AM

Single LLM calls solve simple problems. Multi-agent workflows solve complex ones — by decomposing tasks, running agents in parallel, specializing each agent for a subtask, and having agents check each other’s work.

In 2025, multi-agent workflows became practical for production use. The tooling matured (Claude’s tool use, MCP, extended context), the failure modes are now well-understood, and the cost curve makes multi-step workflows economically viable for a wider range of applications.

This post covers the architecture patterns I’ve found most useful, with complete implementations.

Table of contents

Open Table of contents

When Multi-Agent Beats Single Agent

Use single agent when:

Use multi-agent when:

Pattern 1: Parallel Fan-Out

Process N independent items simultaneously:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
interface TransactionSummary {
transaction_id: string;
risk_assessment: string;
risk_level: 'low' | 'medium' | 'high';
flags: string[];
}
async function analyzeTransaction(
transaction: Record<string, unknown>
): Promise<TransactionSummary> {
const response = await client.messages.create({
model: 'claude-haiku-4-5', // Fast + cheap for individual items
max_tokens: 512,
system: `You are a financial fraud analyst. Analyze transactions for risk.
Always respond with valid JSON only.`,
messages: [
{
role: 'user',
content: `Analyze this transaction for fraud risk:
${JSON.stringify(transaction, null, 2)}
Respond with JSON: { "risk_assessment": "...", "risk_level": "low|medium|high", "flags": [] }`,
},
],
});
const text = response.content[0].type === 'text' ? response.content[0].text : '{}';
const parsed = JSON.parse(text);
return {
transaction_id: transaction.id as string,
...parsed,
};
}
// Fan-out: process all transactions in parallel
async function analyzeBatch(
transactions: Record<string, unknown>[],
concurrency = 10 // Respect rate limits
): Promise<TransactionSummary[]> {
// Process in chunks to avoid rate limits
const results: TransactionSummary[] = [];
for (let i = 0; i < transactions.length; i += concurrency) {
const chunk = transactions.slice(i, i + concurrency);
const chunkResults = await Promise.all(
chunk.map(t => analyzeTransaction(t))
);
results.push(...chunkResults);
}
return results;
}

Key decision: Use claude-haiku-4-5 for individual items (~15x cheaper, significantly faster than Opus) and claude-opus-4-6 for the final synthesis. Right-size each agent for its task.

Pattern 2: Hierarchical Orchestration

An orchestrator agent decomposes the task, spawns worker agents, collects and synthesizes results:

interface SubTask {
id: string;
description: string;
context: string;
}
interface SubTaskResult {
task_id: string;
result: string;
confidence: number;
}
// Worker agent: executes a specific subtask
async function executeSubTask(task: SubTask): Promise<SubTaskResult> {
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
messages: [
{
role: 'user',
content: `Task: ${task.description}
Context:
${task.context}
Complete this task thoroughly. At the end, rate your confidence (0-1) in your answer.`,
},
],
});
const text = response.content[0].type === 'text' ? response.content[0].text : '';
// Extract confidence score from response
const confidenceMatch = text.match(/confidence[:\s]+([0-9.]+)/i);
const confidence = confidenceMatch ? parseFloat(confidenceMatch[1]) : 0.8;
return {
task_id: task.id,
result: text,
confidence,
};
}
// Orchestrator: decomposes task and synthesizes results
async function runOrchestrated(
mainTask: string,
context: string
): Promise<string> {
// Step 1: Orchestrator decomposes the task
const decompositionResponse = await client.messages.create({
model: 'claude-opus-4-6',
max_tokens: 1024,
messages: [
{
role: 'user',
content: `You are a task orchestrator. Decompose this complex task into 3-5 independent subtasks that can be executed in parallel.
Main task: ${mainTask}
Context: ${context}
Respond with JSON: { "subtasks": [{ "id": "1", "description": "...", "context": "..." }] }`,
},
],
});
const decompositionText = decompositionResponse.content[0].type === 'text'
? decompositionResponse.content[0].text
: '{"subtasks": []}';
const { subtasks } = JSON.parse(decompositionText) as { subtasks: SubTask[] };
// Step 2: Execute all subtasks in parallel
console.log(`Executing ${subtasks.length} subtasks in parallel...`);
const subResults = await Promise.all(
subtasks.map(task => executeSubTask(task))
);
// Step 3: Orchestrator synthesizes results
const synthesisResponse = await client.messages.create({
model: 'claude-opus-4-6',
max_tokens: 4096,
messages: [
{
role: 'user',
content: `You are a synthesizer. Combine these parallel research results into a coherent final answer.
Original task: ${mainTask}
Subtask results:
${subResults.map(r => `
Task ${r.task_id} (confidence: ${r.confidence}):
${r.result}
`).join('\n---\n')}
Synthesize a comprehensive final answer, weighing results by confidence.`,
},
],
});
return synthesisResponse.content[0].type === 'text'
? synthesisResponse.content[0].text
: 'Synthesis failed';
}

Pattern 3: Checker / Verifier Agent

Agent A produces. Agent B independently verifies. Especially useful for code generation and financial calculations where errors are costly:

interface GeneratedCode {
code: string;
language: string;
description: string;
}
interface VerificationResult {
passed: boolean;
issues: string[];
suggested_fixes: string[];
confidence: number;
}
// Generator agent: produces code
async function generateCode(requirement: string): Promise<GeneratedCode> {
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
system: 'You are an expert software engineer. Generate clean, production-ready code.',
messages: [
{
role: 'user',
content: `Generate TypeScript code for: ${requirement}
Include error handling, type annotations, and brief inline comments.`,
},
],
});
const text = response.content[0].type === 'text' ? response.content[0].text : '';
// Extract code block
const codeMatch = text.match(/```(?:typescript)?\n([\s\S]*?)```/);
return {
code: codeMatch?.[1] ?? text,
language: 'typescript',
description: requirement,
};
}
// Verifier agent: checks the generated code
async function verifyCode(
generated: GeneratedCode,
requirement: string
): Promise<VerificationResult> {
const response = await client.messages.create({
model: 'claude-opus-4-6', // Use the best model for critical verification
max_tokens: 2048,
system: `You are a strict code reviewer. Your job is to find bugs, security issues,
and logic errors. Be thorough. Do not approve code unless you are certain it's correct.`,
messages: [
{
role: 'user',
content: `Review this ${generated.language} code for the requirement: "${requirement}"
Code:
\`\`\`${generated.language}
${generated.code}
\`\`\`
Check for:
1. Logic errors or off-by-one errors
2. Security vulnerabilities
3. Missing error handling
4. Type safety issues
5. Whether it actually fulfills the requirement
Respond with JSON:
{
"passed": boolean,
"issues": ["list of issues found"],
"suggested_fixes": ["specific fix suggestions"],
"confidence": 0-1
}`,
},
],
});
const text = response.content[0].type === 'text' ? response.content[0].text : '{}';
const jsonMatch = text.match(/\{[\s\S]*\}/);
if (!jsonMatch) {
return { passed: false, issues: ['Verification failed to parse'], suggested_fixes: [], confidence: 0 };
}
return JSON.parse(jsonMatch[0]) as VerificationResult;
}
// Generate → Verify → Regenerate loop
async function generateVerifiedCode(
requirement: string,
maxAttempts = 3
): Promise<{ code: string; verified: boolean }> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
console.log(`Attempt ${attempt}/${maxAttempts}`);
const generated = await generateCode(requirement);
const verification = await verifyCode(generated, requirement);
if (verification.passed && verification.confidence > 0.8) {
console.log(`✓ Verification passed (confidence: ${verification.confidence})`);
return { code: generated.code, verified: true };
}
console.log(`✗ Issues found: ${verification.issues.join(', ')}`);
if (attempt < maxAttempts) {
// Feed verification feedback back into next generation
requirement = `${requirement}
Previous attempt had these issues:
${verification.issues.join('\n')}
Fix suggestions:
${verification.suggested_fixes.join('\n')}`;
}
}
return { code: 'Generation failed after max attempts', verified: false };
}

Pattern 4: Agent-to-Agent Communication via Shared Context

For complex workflows where agents need to build on each other’s work sequentially:

interface AgentContext {
task: string;
history: Array<{ agent: string; output: string; timestamp: Date }>;
state: Record<string, unknown>;
}
class SequentialAgentPipeline {
private context: AgentContext;
private agents: Array<{ name: string; role: string; prompt: string }>;
constructor(task: string, agents: Array<{ name: string; role: string; prompt: string }>) {
this.context = { task, history: [], state: {} };
this.agents = agents;
}
async run(): Promise<AgentContext> {
for (const agent of this.agents) {
console.log(`Running agent: ${agent.name}`);
const historyText = this.context.history.map(h =>
`[${h.agent}]: ${h.output}`
).join('\n\n');
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
system: agent.prompt,
messages: [
{
role: 'user',
content: `Original task: ${this.context.task}
Previous agent outputs:
${historyText || '(none — you are the first agent)'}
Current state: ${JSON.stringify(this.context.state, null, 2)}
Your role: ${agent.role}
Please complete your part of the task, building on previous outputs.`,
},
],
});
const output = response.content[0].type === 'text'
? response.content[0].text
: '';
this.context.history.push({
agent: agent.name,
output,
timestamp: new Date(),
});
}
return this.context;
}
}
// Example: Financial report generation pipeline
const reportPipeline = new SequentialAgentPipeline(
'Generate Q3 2025 financial performance summary for stakeholders',
[
{
name: 'DataAnalyst',
role: 'Analyze raw financial data and extract key metrics',
prompt: 'You are a financial data analyst. Extract and compute key metrics from financial data.',
},
{
name: 'Interpreter',
role: 'Interpret the metrics and identify trends and insights',
prompt: 'You are a financial interpreter. Turn raw metrics into meaningful business insights.',
},
{
name: 'Writer',
role: 'Write a clear, professional summary for executive stakeholders',
prompt: 'You are a financial writer. Create clear, concise executive summaries.',
},
{
name: 'Reviewer',
role: 'Review the summary for accuracy, clarity, and completeness',
prompt: 'You are a senior editor. Ensure financial reports are accurate and appropriately scoped.',
},
]
);

Error Recovery and Resilience

Multi-agent workflows fail more often than single calls — more surface area for errors:

class ResilientAgent {
private retries: number;
private fallbackModel: string;
constructor(
private primaryModel: string = 'claude-opus-4-6',
fallbackModel = 'claude-sonnet-4-6',
retries = 3
) {
this.fallbackModel = fallbackModel;
this.retries = retries;
}
async call(
params: Omit<Anthropic.MessageCreateParams, 'model'>
): Promise<Anthropic.Message> {
let lastError: Error | undefined;
let model = this.primaryModel;
for (let attempt = 1; attempt <= this.retries; attempt++) {
try {
return await client.messages.create({ ...params, model });
} catch (err) {
lastError = err as Error;
if (err instanceof Anthropic.RateLimitError) {
const waitMs = Math.min(1000 * Math.pow(2, attempt), 60_000);
console.warn(`Rate limited. Waiting ${waitMs}ms...`);
await sleep(waitMs);
} else if (err instanceof Anthropic.APIError && err.status === 529) {
// Overloaded — try fallback model
if (model === this.primaryModel) {
console.warn(`Primary model overloaded, trying fallback: ${this.fallbackModel}`);
model = this.fallbackModel;
} else {
await sleep(30_000);
}
} else {
throw err; // Non-retryable
}
}
}
throw lastError;
}
}
const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));

Cost Estimation for Multi-Agent Workflows

Before deploying, estimate costs at scale:

// Rough cost estimator (check current pricing at anthropic.com)
// Check current pricing at anthropic.com/pricing — these change frequently
const PRICING = {
'claude-opus-4-6': { input: 15.0, output: 75.0 }, // per million tokens
'claude-sonnet-4-6': { input: 3.0, output: 15.0 },
'claude-haiku-4-5': { input: 1.0, output: 5.0 },
};
function estimateWorkflowCost(
steps: Array<{
model: keyof typeof PRICING;
estimatedInputTokens: number;
estimatedOutputTokens: number;
parallelism?: number;
}>
): { costPerRun: number; costPer1000Runs: number } {
const costPerRun = steps.reduce((total, step) => {
const pricing = PRICING[step.model];
const parallelism = step.parallelism ?? 1;
const stepCost = (
(step.estimatedInputTokens / 1_000_000) * pricing.input +
(step.estimatedOutputTokens / 1_000_000) * pricing.output
) * parallelism;
return total + stepCost;
}, 0);
return {
costPerRun,
costPer1000Runs: costPerRun * 1000,
};
}
// Example: fraud analysis workflow
const estimate = estimateWorkflowCost([
// Orchestrator (runs once)
{ model: 'claude-opus-4-6', estimatedInputTokens: 1000, estimatedOutputTokens: 500 },
// Workers (run 10 in parallel)
{ model: 'claude-haiku-4-5', estimatedInputTokens: 500, estimatedOutputTokens: 200, parallelism: 10 },
// Synthesizer (runs once)
{ model: 'claude-opus-4-6', estimatedInputTokens: 5000, estimatedOutputTokens: 1000 },
]);
console.log(`Cost per run: $${estimate.costPerRun.toFixed(4)}`);
console.log(`Cost per 1000 runs: $${estimate.costPer1000Runs.toFixed(2)}`);

Multi-agent workflows are not magic — they’re engineering. The patterns above provide the building blocks. The real work is matching the architecture to the problem: right-sizing models, identifying genuine parallelism, and building recovery logic for the failure modes that emerge at scale.