The on-call alert fired at 3 AM. Node.js process using 2.1GB RSS, climbing fast. We killed and restarted it. A week later, same thing. The service had been running for six months without this problem — something changed. This is the story of finding and fixing it.
Table of contents
Open Table of contents
Symptoms and Initial Investigation
The service: a transaction processing API, ~200 req/s, Node.js 18, Express, PostgreSQL (pg pool), Redis (ioredis).
Symptoms:
- Memory grows ~50MB/hour under normal load
- Process crashes (OOM kill) after 5-7 days
- No memory growth during low-traffic periods
- Started approximately 3 weeks ago (correlates with a deploy)
Initial checks:
# Check current memorycurl http://localhost:3000/health | jq '.memory'# {"heapUsed": 890MB, "heapTotal": 1100MB, "rss": 2100MB, "external": 850MB}
# heapUsed is extremely high — objects accumulating on the V8 heap# The gap between heapUsed and rss includes V8 overhead and native allocationsThe high heapUsed (890MB) pointed toward JavaScript objects accumulating — likely Maps, closures, or cached data holding references to request/response bodies.
Step 1: Taking Heap Snapshots
// Add to your Express app temporarilyconst { writeHeapSnapshot } = require('v8');
app.get('/debug/heap-snapshot', (req, res) => { const filename = writeHeapSnapshot(); // writes to CWD res.json({ filename });});The three-snapshot technique:
- Snapshot 1: After startup (baseline)
- Snapshot 2: After some load (5-10 minutes)
- Snapshot 3: After more load (same interval)
Objects that grow from S1 → S2 → S3 are the leak.
# Take snapshotscurl http://localhost:3000/debug/heap-snapshot# {"filename":"Heap-20240315T140523Z-98765.heapsnapshot"}
# Wait 10 minutes under load, take secondcurl http://localhost:3000/debug/heap-snapshot
# Wait 10 minutes, take thirdcurl http://localhost:3000/debug/heap-snapshot
# Download them (scp, etc.)Open in Chrome DevTools → Memory → Load snapshots.
What I found: “Array” category growing from 45MB → 89MB → 178MB between snapshots. Doubles each interval.
Drilling into the array objects: they were Buffer instances. Many small Buffers (128-512 bytes each), all referencing the same constructor path: IncomingMessage._body → Buffer.
Step 2: The Leak Location
The three-snapshot difference view showed the growing objects all had the same retainer chain:
(root) → global → Map → Map entries → IncomingMessage → _body → BufferA Map on the global object was retaining IncomingMessage objects (HTTP request objects). IncomingMessage objects retain their body buffers. Body buffers grow with each request.
This was a request cache gone wrong. In the deploy 3 weeks ago, someone had added a cache for deduplication:
// The leaking code — added in the problematic deployconst requestCache = new Map(); // GLOBAL — never cleaned
app.post('/transactions', async (req, res) => { const idempotencyKey = req.headers['x-idempotency-key'];
if (idempotencyKey) { if (requestCache.has(idempotencyKey)) { const cached = requestCache.get(idempotencyKey); return res.status(200).json(cached); // Return cached response } }
const result = await processTransaction(req.body);
if (idempotencyKey) { requestCache.set(idempotencyKey, result); // NEVER DELETED // Also accidentally retained: the entire req object! }
res.json(result);});Two problems:
- The cache never expires — it grows forever
- It was accidentally storing a reference to something that held the
reqobject
Actually the second problem wasn’t obvious. The cache stored result — but what was result? It was the processed transaction object. What did that contain?
async function processTransaction(body) { // ... return { id: uuid(), status: 'processed', originalRequest: body, // ← HERE. Stored the entire request body. timestamp: new Date(), };}result contained originalRequest: body — a reference to the request body, which held a reference to… well, it chains. The garbage collector can’t collect req.body because result.originalRequest references it, and result is in the global requestCache, which is never cleaned.
Step 3: The Fix
// Fix 1: Use a proper TTL cacheimport { LRUCache } from 'lru-cache'; // npm install lru-cache
const requestCache = new LRUCache({ max: 10_000, // Max 10k entries ttl: 1000 * 60 * 30, // 30 minute TTL allowStale: false,});
// Fix 2: Don't store references to the requestasync function processTransaction(body) { return { id: uuid(), status: 'processed', // REMOVED: originalRequest: body amount: body.amount, // Store only what's needed currency: body.currency, timestamp: new Date(), };}Memory growth stopped immediately after deploying. Heap stabilized at ~200MB.
Common Node.js Memory Leak Patterns
1. EventEmitter Listener Accumulation
// WRONG: adds a listener every time this function is calledfunction setupHandler() { process.on('exit', () => { cleanup(); });}
// If setupHandler() is called 1000 times: 1000 listeners on process.exit// Node.js warns: "MaxListenersExceededWarning: Possible EventEmitter memory leak"
// CORRECT: use once(), or track and remove listenersfunction setupHandler() { process.once('exit', cleanup); // Only registers once}
// ORconst handler = () => cleanup();emitter.on('event', handler);// Later:emitter.off('event', handler); // Always remove when done2. Closures Retaining Large Variables
// WRONG: outer is captured by the returned functionfunction createProcessor(largeConfig) { const outer = { config: largeConfig, cache: new Map() }; // large object return function process(data) { return outer.cache.get(data) || computeResult(data, outer.config); };}
// If you store many processors, each retains the full outer object
// CORRECT: extract only what's neededfunction createProcessor(largeConfig) { const relevantConfig = extractRelevant(largeConfig); // smaller const cache = new Map(); return function process(data) { return cache.get(data) || computeResult(data, relevantConfig); };}3. Circular References (Less Common in Modern V8)
V8’s garbage collector handles most circular references. But circular references through native bindings or WeakMaps can sometimes cause issues:
// This is fine — V8 GC handles itconst a = {};const b = { ref: a };a.ref = b;// a and b will be collected when they go out of scope
// This can leak: circular ref through a global Mapconst registry = new Map();class Connection { constructor() { registry.set(this, { handlers: [] }); } // Never calls registry.delete(this)!}// Every Connection lives forever in registry4. SetInterval Retaining Closures
// WRONG: The interval captures handler, which captures db connectionfunction startMonitoring(db) { const interval = setInterval(() => { db.query('SELECT 1'); // captures db in closure }, 5000); // interval is never cleared // db will never be garbage collected}
// CORRECT: always return a cleanup functionfunction startMonitoring(db) { const interval = setInterval(() => { db.query('SELECT 1'); }, 5000);
return () => clearInterval(interval); // Return cleanup}
const stopMonitoring = startMonitoring(db);// Later:stopMonitoring(); // Clear interval, allow db to be GC'd5. Forgotten setTimeout Chains
// This creates a chain that never ends:function poll() { checkDatabase(); setTimeout(poll, 1000); // schedules itself forever}poll();
// And it captures everything in the poll() scopeDiagnostic Tools
node --inspect: Enable Chrome DevTools for a running Node.js process. Memory profiler, heap snapshots, CPU profiles all available.
node --expose-gc: Exposes global.gc() in your code. Call it before taking snapshots to force a GC cycle, making the leak signal cleaner.
process.memoryUsage(): Quick programmatic check:
setInterval(() => { const { heapUsed, heapTotal, rss, external } = process.memoryUsage(); console.log(JSON.stringify({ heapUsed, heapTotal, rss, external }));}, 60_000);clinic heapprofiler: From the Clinic.js toolchain. Profiles heap allocations over time, identifies allocation sites.
--max-old-space-size: Set heap limit explicitly. Faster crashes = faster feedback loop during debugging:
node --max-old-space-size=512 server.js # Crash at 512MB instead of OOMThe Post-Mortem
In retrospect, the leak had three contributing factors:
-
Code review missed it: The original PR showed
requestCache.set(idempotencyKey, result)— looks fine in isolation. The problem was inprocessTransaction’s return value structure, reviewed separately. -
No memory monitoring: We had CPU and HTTP metrics but no heap usage alerting. Added
heapUsed/heapTotal > 80%→ PagerDuty alert. -
No cache TTL discipline: “We’ll add expiration later” is how every unbounded cache starts. Now all caches require a TTL and
maxas non-optional constructor parameters in our coding standards.
Memory leaks are almost always a retention problem — something that should be released is being kept alive by an unexpected reference. The tools exist to find them. The hard part is building the habit of looking before the 3 AM alert fires.
Related posts
- Node.js Diagnostic Tools — Heap Snapshots, Flame Graphs, and DoctorJS in 2026 — the full reference guide to every diagnostic tool used in this case study