Multi-Agent Workflows with Claude API — Architecture Patterns That Work

Single LLM calls solve simple problems. Multi-agent workflows solve complex ones — by decomposing tasks, running agents in parallel, specializing each agent for a subtask, and having agents check each other’s work.

In 2025, multi-agent workflows became practical for production use. The tooling matured (Claude’s tool use, MCP, extended context), the failure modes are now well-understood, and the cost curve makes multi-step workflows economically viable for a wider range of applications.

This post covers the architecture patterns I’ve found most useful, with complete implementations.

Open Table of contents

When Multi-Agent Beats Single Agent
Pattern 1: Parallel Fan-Out
Pattern 2: Hierarchical Orchestration
Pattern 3: Checker / Verifier Agent
Pattern 4: Agent-to-Agent Communication via Shared Context
Error Recovery and Resilience
Cost Estimation for Multi-Agent Workflows
Related posts

When Multi-Agent Beats Single Agent

Use single agent when:

The task fits in one context window
The task is sequential with no parallelism opportunities
Latency is critical (each additional agent adds ~2-30s)

Use multi-agent when:

Subtasks can execute in parallel (10x throughput improvement)
Different subtasks need different specializations or context
You want independent verification (agent A produces, agent B reviews)
The total work exceeds a single context window
You need to process many independent items (fan-out pattern)

Pattern 1: Parallel Fan-Out

Process N independent items simultaneously:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

interface TransactionSummary {
  transaction_id: string;
  risk_assessment: string;
  risk_level: 'low' | 'medium' | 'high';
  flags: string[];
}

async function analyzeTransaction(
  transaction: Record<string, unknown>
): Promise<TransactionSummary> {
  const response = await client.messages.create({
    model: 'claude-haiku-4-5', // Fast + cheap for individual items
    max_tokens: 512,
    system: `You are a financial fraud analyst. Analyze transactions for risk.
    Always respond with valid JSON only.`,
    messages: [
      {
        role: 'user',
        content: `Analyze this transaction for fraud risk:
${JSON.stringify(transaction, null, 2)}

Respond with JSON: { "risk_assessment": "...", "risk_level": "low|medium|high", "flags": [] }`,
      },
    ],
  });

  const text = response.content[0].type === 'text' ? response.content[0].text : '{}';
  const parsed = JSON.parse(text);

  return {
    transaction_id: transaction.id as string,
    ...parsed,
  };
}

// Fan-out: process all transactions in parallel
async function analyzeBatch(
  transactions: Record<string, unknown>[],
  concurrency = 10 // Respect rate limits
): Promise<TransactionSummary[]> {
  // Process in chunks to avoid rate limits
  const results: TransactionSummary[] = [];

  for (let i = 0; i < transactions.length; i += concurrency) {
    const chunk = transactions.slice(i, i + concurrency);
    const chunkResults = await Promise.all(
      chunk.map(t => analyzeTransaction(t))
    );
    results.push(...chunkResults);
  }

  return results;
}

Key decision: Use claude-haiku-4-5 for individual items (~15x cheaper, significantly faster than Opus) and claude-opus-4-6 for the final synthesis. Right-size each agent for its task.

Pattern 2: Hierarchical Orchestration

An orchestrator agent decomposes the task, spawns worker agents, collects and synthesizes results:

interface SubTask {
  id: string;
  description: string;
  context: string;
}

interface SubTaskResult {
  task_id: string;
  result: string;
  confidence: number;
}

// Worker agent: executes a specific subtask
async function executeSubTask(task: SubTask): Promise<SubTaskResult> {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 2048,
    messages: [
      {
        role: 'user',
        content: `Task: ${task.description}

Context:
${task.context}

Complete this task thoroughly. At the end, rate your confidence (0-1) in your answer.`,
      },
    ],
  });

  const text = response.content[0].type === 'text' ? response.content[0].text : '';

  // Extract confidence score from response
  const confidenceMatch = text.match(/confidence[:\s]+([0-9.]+)/i);
  const confidence = confidenceMatch ? parseFloat(confidenceMatch[1]) : 0.8;

  return {
    task_id: task.id,
    result: text,
    confidence,
  };
}

// Orchestrator: decomposes task and synthesizes results
async function runOrchestrated(
  mainTask: string,
  context: string
): Promise<string> {

  // Step 1: Orchestrator decomposes the task
  const decompositionResponse = await client.messages.create({
    model: 'claude-opus-4-6',
    max_tokens: 1024,
    messages: [
      {
        role: 'user',
        content: `You are a task orchestrator. Decompose this complex task into 3-5 independent subtasks that can be executed in parallel.

Main task: ${mainTask}

Context: ${context}

Respond with JSON: { "subtasks": [{ "id": "1", "description": "...", "context": "..." }] }`,
      },
    ],
  });

  const decompositionText = decompositionResponse.content[0].type === 'text'
    ? decompositionResponse.content[0].text
    : '{"subtasks": []}';

  const { subtasks } = JSON.parse(decompositionText) as { subtasks: SubTask[] };

  // Step 2: Execute all subtasks in parallel
  console.log(`Executing ${subtasks.length} subtasks in parallel...`);
  const subResults = await Promise.all(
    subtasks.map(task => executeSubTask(task))
  );

  // Step 3: Orchestrator synthesizes results
  const synthesisResponse = await client.messages.create({
    model: 'claude-opus-4-6',
    max_tokens: 4096,
    messages: [
      {
        role: 'user',
        content: `You are a synthesizer. Combine these parallel research results into a coherent final answer.

Original task: ${mainTask}

Subtask results:
${subResults.map(r => `
Task ${r.task_id} (confidence: ${r.confidence}):
${r.result}
`).join('\n---\n')}

Synthesize a comprehensive final answer, weighing results by confidence.`,
      },
    ],
  });

  return synthesisResponse.content[0].type === 'text'
    ? synthesisResponse.content[0].text
    : 'Synthesis failed';
}

Pattern 3: Checker / Verifier Agent

Agent A produces. Agent B independently verifies. Especially useful for code generation and financial calculations where errors are costly:

interface GeneratedCode {
  code: string;
  language: string;
  description: string;
}

interface VerificationResult {
  passed: boolean;
  issues: string[];
  suggested_fixes: string[];
  confidence: number;
}

// Generator agent: produces code
async function generateCode(requirement: string): Promise<GeneratedCode> {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 2048,
    system: 'You are an expert software engineer. Generate clean, production-ready code.',
    messages: [
      {
        role: 'user',
        content: `Generate TypeScript code for: ${requirement}

Include error handling, type annotations, and brief inline comments.`,
      },
    ],
  });

  const text = response.content[0].type === 'text' ? response.content[0].text : '';

  // Extract code block
  const codeMatch = text.match(/```(?:typescript)?\n([\s\S]*?)```/);

  return {
    code: codeMatch?.[1] ?? text,
    language: 'typescript',
    description: requirement,
  };
}

// Verifier agent: checks the generated code
async function verifyCode(
  generated: GeneratedCode,
  requirement: string
): Promise<VerificationResult> {
  const response = await client.messages.create({
    model: 'claude-opus-4-6', // Use the best model for critical verification
    max_tokens: 2048,
    system: `You are a strict code reviewer. Your job is to find bugs, security issues,
    and logic errors. Be thorough. Do not approve code unless you are certain it's correct.`,
    messages: [
      {
        role: 'user',
        content: `Review this ${generated.language} code for the requirement: "${requirement}"

Code:
\`\`\`${generated.language}
${generated.code}
\`\`\`

Check for:
1. Logic errors or off-by-one errors
2. Security vulnerabilities
3. Missing error handling
4. Type safety issues
5. Whether it actually fulfills the requirement

Respond with JSON:
{
  "passed": boolean,
  "issues": ["list of issues found"],
  "suggested_fixes": ["specific fix suggestions"],
  "confidence": 0-1
}`,
      },
    ],
  });

  const text = response.content[0].type === 'text' ? response.content[0].text : '{}';
  const jsonMatch = text.match(/\{[\s\S]*\}/);

  if (!jsonMatch) {
    return { passed: false, issues: ['Verification failed to parse'], suggested_fixes: [], confidence: 0 };
  }

  return JSON.parse(jsonMatch[0]) as VerificationResult;
}

// Generate → Verify → Regenerate loop
async function generateVerifiedCode(
  requirement: string,
  maxAttempts = 3
): Promise<{ code: string; verified: boolean }> {

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    console.log(`Attempt ${attempt}/${maxAttempts}`);

    const generated = await generateCode(requirement);
    const verification = await verifyCode(generated, requirement);

    if (verification.passed && verification.confidence > 0.8) {
      console.log(`✓ Verification passed (confidence: ${verification.confidence})`);
      return { code: generated.code, verified: true };
    }

    console.log(`✗ Issues found: ${verification.issues.join(', ')}`);

    if (attempt < maxAttempts) {
      // Feed verification feedback back into next generation
      requirement = `${requirement}

Previous attempt had these issues:
${verification.issues.join('\n')}

Fix suggestions:
${verification.suggested_fixes.join('\n')}`;
    }
  }

  return { code: 'Generation failed after max attempts', verified: false };
}

Pattern 4: Agent-to-Agent Communication via Shared Context

For complex workflows where agents need to build on each other’s work sequentially:

interface AgentContext {
  task: string;
  history: Array<{ agent: string; output: string; timestamp: Date }>;
  state: Record<string, unknown>;
}

class SequentialAgentPipeline {
  private context: AgentContext;
  private agents: Array<{ name: string; role: string; prompt: string }>;

  constructor(task: string, agents: Array<{ name: string; role: string; prompt: string }>) {
    this.context = { task, history: [], state: {} };
    this.agents = agents;
  }

  async run(): Promise<AgentContext> {
    for (const agent of this.agents) {
      console.log(`Running agent: ${agent.name}`);

      const historyText = this.context.history.map(h =>
        `[${h.agent}]: ${h.output}`
      ).join('\n\n');

      const response = await client.messages.create({
        model: 'claude-sonnet-4-6',
        max_tokens: 2048,
        system: agent.prompt,
        messages: [
          {
            role: 'user',
            content: `Original task: ${this.context.task}

Previous agent outputs:
${historyText || '(none — you are the first agent)'}

Current state: ${JSON.stringify(this.context.state, null, 2)}

Your role: ${agent.role}
Please complete your part of the task, building on previous outputs.`,
          },
        ],
      });

      const output = response.content[0].type === 'text'
        ? response.content[0].text
        : '';

      this.context.history.push({
        agent: agent.name,
        output,
        timestamp: new Date(),
      });
    }

    return this.context;
  }
}

// Example: Financial report generation pipeline
const reportPipeline = new SequentialAgentPipeline(
  'Generate Q3 2025 financial performance summary for stakeholders',
  [
    {
      name: 'DataAnalyst',
      role: 'Analyze raw financial data and extract key metrics',
      prompt: 'You are a financial data analyst. Extract and compute key metrics from financial data.',
    },
    {
      name: 'Interpreter',
      role: 'Interpret the metrics and identify trends and insights',
      prompt: 'You are a financial interpreter. Turn raw metrics into meaningful business insights.',
    },
    {
      name: 'Writer',
      role: 'Write a clear, professional summary for executive stakeholders',
      prompt: 'You are a financial writer. Create clear, concise executive summaries.',
    },
    {
      name: 'Reviewer',
      role: 'Review the summary for accuracy, clarity, and completeness',
      prompt: 'You are a senior editor. Ensure financial reports are accurate and appropriately scoped.',
    },
  ]
);

Error Recovery and Resilience

Multi-agent workflows fail more often than single calls — more surface area for errors:

class ResilientAgent {
  private retries: number;
  private fallbackModel: string;

  constructor(
    private primaryModel: string = 'claude-opus-4-6',
    fallbackModel = 'claude-sonnet-4-6',
    retries = 3
  ) {
    this.fallbackModel = fallbackModel;
    this.retries = retries;
  }

  async call(
    params: Omit<Anthropic.MessageCreateParams, 'model'>
  ): Promise<Anthropic.Message> {
    let lastError: Error | undefined;
    let model = this.primaryModel;

    for (let attempt = 1; attempt <= this.retries; attempt++) {
      try {
        return await client.messages.create({ ...params, model });
      } catch (err) {
        lastError = err as Error;

        if (err instanceof Anthropic.RateLimitError) {
          const waitMs = Math.min(1000 * Math.pow(2, attempt), 60_000);
          console.warn(`Rate limited. Waiting ${waitMs}ms...`);
          await sleep(waitMs);

        } else if (err instanceof Anthropic.APIError && err.status === 529) {
          // Overloaded — try fallback model
          if (model === this.primaryModel) {
            console.warn(`Primary model overloaded, trying fallback: ${this.fallbackModel}`);
            model = this.fallbackModel;
          } else {
            await sleep(30_000);
          }

        } else {
          throw err; // Non-retryable
        }
      }
    }

    throw lastError;
  }
}

const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));

Cost Estimation for Multi-Agent Workflows

Before deploying, estimate costs at scale:

// Rough cost estimator (check current pricing at anthropic.com)
// Check current pricing at anthropic.com/pricing — these change frequently
const PRICING = {
  'claude-opus-4-6':   { input: 15.0, output: 75.0 }, // per million tokens
  'claude-sonnet-4-6': { input: 3.0,  output: 15.0 },
  'claude-haiku-4-5':  { input: 1.0,  output: 5.0  },
};

function estimateWorkflowCost(
  steps: Array<{
    model: keyof typeof PRICING;
    estimatedInputTokens: number;
    estimatedOutputTokens: number;
    parallelism?: number;
  }>
): { costPerRun: number; costPer1000Runs: number } {

  const costPerRun = steps.reduce((total, step) => {
    const pricing = PRICING[step.model];
    const parallelism = step.parallelism ?? 1;

    const stepCost = (
      (step.estimatedInputTokens / 1_000_000) * pricing.input +
      (step.estimatedOutputTokens / 1_000_000) * pricing.output
    ) * parallelism;

    return total + stepCost;
  }, 0);

  return {
    costPerRun,
    costPer1000Runs: costPerRun * 1000,
  };
}

// Example: fraud analysis workflow
const estimate = estimateWorkflowCost([
  // Orchestrator (runs once)
  { model: 'claude-opus-4-6', estimatedInputTokens: 1000, estimatedOutputTokens: 500 },
  // Workers (run 10 in parallel)
  { model: 'claude-haiku-4-5', estimatedInputTokens: 500, estimatedOutputTokens: 200, parallelism: 10 },
  // Synthesizer (runs once)
  { model: 'claude-opus-4-6', estimatedInputTokens: 5000, estimatedOutputTokens: 1000 },
]);

console.log(`Cost per run: $${estimate.costPerRun.toFixed(4)}`);
console.log(`Cost per 1000 runs: $${estimate.costPer1000Runs.toFixed(2)}`);

Multi-agent workflows are not magic — they’re engineering. The patterns above provide the building blocks. The real work is matching the architecture to the problem: right-sizing models, identifying genuine parallelism, and building recovery logic for the failure modes that emerge at scale.

LLM API Integration Patterns — Structured Outputs, Function Calling, Streaming — the foundational single-call patterns that multi-agent orchestration builds on
Building MCP Servers — The New API Layer for AI Agents — how MCP servers expose tools that agents in these workflows can discover and call