Node.js diagnostic tooling has matured significantly over the past few years. What used to require manual V8 flags and external scripts is now accessible through first-class APIs and polished tools. This post is the reference guide I wish I’d had — covering every tool in the Node.js diagnostics toolkit and, critically, when each one is the right choice.
Table of contents
Open Table of contents
- The Diagnostic Hierarchy
- Level 1: Process Metrics
- Level 2: Clinic.js Doctor
- Level 3: Clinic.js Flame — CPU Profiling
- Level 3: Clinic.js Heap Profiler — Memory Leak Analysis
- Level 4: V8 CPU Profile
- Level 4: Heap Snapshots in Detail
- OpenTelemetry: Distributed Profiling
- The 2026 Toolchain in Practice
- Related posts
The Diagnostic Hierarchy
Start simple, go deeper only when needed:
Level 1: Process metrics (CPU%, memory, event loop lag) → Is there a problem? Where roughly?
Level 2: Clinic.js Doctor → What category of problem? CPU, memory, I/O, event loop?
Level 3: Clinic.js Flame (CPU) or Heap Profiler (memory) → Which code is responsible?
Level 4: V8 CPU profile / Heap snapshot → Exact function-level attribution, allocation sites
Level 5: OpenTelemetry traces → Distributed attribution across microservicesDon’t skip to Level 4 — each level is faster to run and easier to interpret.
Level 1: Process Metrics
Quick health check without any external tooling:
// Add to any Express appapp.get('/health', (req, res) => { const { heapUsed, heapTotal, rss, external } = process.memoryUsage(); const uptime = process.uptime();
res.json({ status: 'ok', uptime_seconds: uptime, memory: { heap_used_mb: Math.round(heapUsed / 1e6), heap_total_mb: Math.round(heapTotal / 1e6), rss_mb: Math.round(rss / 1e6), external_mb: Math.round(external / 1e6), heap_utilization: `${Math.round(heapUsed / heapTotal * 100)}%`, }, node_version: process.version, pid: process.pid, });});
// Event loop lag measurementimport { monitorEventLoopDelay } from 'perf_hooks';
const loopMonitor = monitorEventLoopDelay({ resolution: 10 });loopMonitor.enable();
app.get('/health/eventloop', (req, res) => { res.json({ p50_ms: loopMonitor.percentile(50) / 1e6, // nanoseconds → ms p95_ms: loopMonitor.percentile(95) / 1e6, p99_ms: loopMonitor.percentile(99) / 1e6, max_ms: loopMonitor.max / 1e6, mean_ms: loopMonitor.mean / 1e6, });});Alert thresholds:
- Event loop P99 > 10ms: investigate
- Event loop P99 > 100ms: urgent
heapUsed/heapTotal> 85%: heap pressure, GC strugglingexternalmemory > 1GB: likely Buffer/ArrayBuffer leak
Level 2: Clinic.js Doctor
npm install -g clinicclinic doctor -- node server.jsDoctor instruments your process and generates a report covering:
Reading the Doctor Report
Event loop delay graph: Healthy apps show near-zero delay (< 1ms). Spikes indicate synchronous blocking code executing during high-load periods.
CPU utilization graph: Near 100% CPU is usually fine if it correlates with request rate. Unexplained 100% CPU during idle periods indicates a runaway timer or misconfigured worker.
Memory graph: Steady growth that never decreases indicates a leak. Sawtooth pattern (grow → GC release → grow) is normal.
Active handles: Should correlate with concurrent connections. Handles that grow without bound indicate unclosed resources.
Doctor’s AI analysis adds diagnostic suggestions. In 2026, these suggestions are accurate for the common patterns (event loop blocking, I/O saturation, heap pressure).
Level 3: Clinic.js Flame — CPU Profiling
clinic flame -- node server.js# Apply load with autocannonnpx autocannon -c 100 -d 30 http://localhost:3000/api/heavy# Ctrl+C the server → report opens automaticallyReading Flame Graphs
A flame graph represents sampled call stacks. The x-axis is alphabetically sorted stack frames (not time — that’s a flame chart), and the width of each frame represents its proportion of total samples. The y-axis is call depth.
What to look for:
Wide blocks at the top = hot self-time (function taking lots of time itself)Wide plateau in middle = common ancestor, called oftenNarrow tall spikes = deep call stacks, usually librariesCommon patterns:
Your function └─ JSON.stringify ← wide: serializing large objects └─ (native)
Your route handler └─ db.query ← wide: database calls └─ pg internalFiltering: Clinic Flame lets you filter by package name. Filter out node_modules to see only your code. Filter to a specific package to understand library overhead.
Key insight: If you see node_modules dominating the flame graph, it’s not necessarily “their fault” — you might be calling their code in a tight loop unnecessarily.
Level 3: Clinic.js Heap Profiler — Memory Leak Analysis
clinic heapprofiler -- node server.js# Apply sustained load for 2-5 minutes# Ctrl+C → report opensThe Heap Profiler shows allocations over time, colored by whether they’re:
- Green: Objects that were allocated and garbage collected (normal)
- Yellow: Objects still alive when profiling ended (suspicious)
The Three-Snapshot Technique
For leaks, heap snapshots are more useful than the profiler:
// Enable heap snapshot on demandimport { writeHeapSnapshot } from 'v8';
app.get('/debug/snapshot', (req, res) => { if (process.env.NODE_ENV !== 'development') { return res.status(403).json({ error: 'Only in development' }); } const filename = writeHeapSnapshot(); res.json({ filename });});- Take snapshot after startup (baseline)
- Apply sustained load for 10 minutes
- Take second snapshot
- Apply more load for 10 minutes
- Take third snapshot
In Chrome DevTools Memory tab: load all three snapshots, use “Comparison” view to see what grew between S2 and S3. Objects that grew are your leak.
Level 4: V8 CPU Profile
For precise function-level attribution:
# Run with V8 profilingnode --prof server.js
# Apply load, then stop
# Process the isolate-*.log filenode --prof-process isolate-*.log > processed.txtThe output shows bottom-up profiling data:
[Bottom up (heavy) profile]: Note: percentage shows a share of a particular caller in the total amount of its parent calls. Callers occupying less than 1.0% are not shown.
ticks parent name 5321 42.1% /usr/lib/node_modules/.../v8/src/heap/... 2105 16.6% node:internal/crypto/hash 1802 14.2% /app/src/services/transaction.js:45:processAmountThis shows processAmount at line 45 consuming 14.2% of CPU ticks — a precise target for optimization.
The .cpuprofile format can also be loaded directly into Chrome DevTools (Performance tab → Load Profile).
Level 4: Heap Snapshots in Detail
The .heapsnapshot file is JSON with the complete heap graph. Chrome DevTools provides the best UI for exploring it.
Key views in DevTools Memory tab:
Summary: Objects grouped by constructor. Look for unexpected growth in:
Array— often storing data that should be evictedObject— catch-all for plain objectsClosure— functions capturing variables(compiled code)— JIT-compiled functions (usually fine)- Your own class names — direct evidence of leaking instances
Comparison: Difference between two snapshots. Objects with +count that didn’t +size proportionally are worth investigating (retained references accumulating).
Containment: Tree view of object graph. Follow retainer chains to find what’s keeping your leaking objects alive.
Retainers panel: When you select an object, shows what’s holding a reference to it. Follow the chain up to the GC root to find the leak location.
OpenTelemetry: Distributed Profiling
For microservices, individual process profiling misses cross-service latency:
// instrument.js — load before your applicationimport { NodeSDK } from '@opentelemetry/sdk-node';import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({ serviceName: 'transaction-api', traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces', }), instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy }), ],});
sdk.start();process.on('SIGTERM', () => sdk.shutdown());# --import for ESM modules (Node 18.19+); use --require for CJSnode --import ./instrument.js server.jsAuto-instrumentation captures:
- HTTP request spans (incoming + outgoing)
- Database queries (pg, mysql, mongodb)
- Redis operations
- gRPC calls
In Jaeger or Grafana Tempo, trace a slow request end-to-end:
HTTP POST /transactions (250ms)├─ Middleware (2ms)├─ Validation (3ms)├─ DB: SELECT users (45ms)├─ HTTP GET exchange-rates-api (180ms) ← THE PROBLEM└─ DB: INSERT transaction (15ms)The distributed trace immediately shows the external API call is the bottleneck — something that CPU profiling of a single service would miss.
The 2026 Toolchain in Practice
Modern Node.js diagnostics workflow:
# 1. Identify the symptomcurl http://api/health/eventloop # Check event loop lag
# 2. Reproduce under loadnpx autocannon -c 100 -d 60 http://api/endpoint
# 3. Doctor for categoryclinic doctor -- node server.js
# 4. Flame for CPU issuesclinic flame -- node server.js
# 5. Heap profiler for memory issuesclinic heapprofiler -- node server.js
# 6. Manual heap snapshots for leaks# (use the /debug/snapshot endpoint)
# 7. OpenTelemetry for distributed latency# (already instrumented in production)The toolchain is comprehensive and the CLI experience is smooth. The main skill is interpreting the output — knowing what pattern in a flame graph indicates JSON serialization vs database polling vs synchronous crypto. That interpretation skill comes from practice. Run these tools regularly, even on healthy services — you’ll learn what “normal” looks like, which makes anomalies obvious.
Related posts
- Profiling Node.js with Clinic.js and DoctorJS — A Real Case Study — a hands-on case study using Clinic.js Doctor and Flame to diagnose real throughput problems