Skip to content

Node.js Memory Leaks — How I Found and Fixed a 2GB Leak in Production

Posted on:June 16, 2025 at 10:00 AM

The on-call alert fired at 3 AM. Node.js process using 2.1GB RSS, climbing fast. We killed and restarted it. A week later, same thing. The service had been running for six months without this problem — something changed. This is the story of finding and fixing it.

Table of contents

Open Table of contents

Symptoms and Initial Investigation

The service: a transaction processing API, ~200 req/s, Node.js 18, Express, PostgreSQL (pg pool), Redis (ioredis).

Symptoms:

Initial checks:

Terminal window
# Check current memory
curl http://localhost:3000/health | jq '.memory'
# {"heapUsed": 890MB, "heapTotal": 1100MB, "rss": 2100MB, "external": 850MB}
# heapUsed is extremely high — objects accumulating on the V8 heap
# The gap between heapUsed and rss includes V8 overhead and native allocations

The high heapUsed (890MB) pointed toward JavaScript objects accumulating — likely Maps, closures, or cached data holding references to request/response bodies.

Step 1: Taking Heap Snapshots

// Add to your Express app temporarily
const { writeHeapSnapshot } = require('v8');
app.get('/debug/heap-snapshot', (req, res) => {
const filename = writeHeapSnapshot(); // writes to CWD
res.json({ filename });
});

The three-snapshot technique:

  1. Snapshot 1: After startup (baseline)
  2. Snapshot 2: After some load (5-10 minutes)
  3. Snapshot 3: After more load (same interval)

Objects that grow from S1 → S2 → S3 are the leak.

Terminal window
# Take snapshots
curl http://localhost:3000/debug/heap-snapshot
# {"filename":"Heap-20240315T140523Z-98765.heapsnapshot"}
# Wait 10 minutes under load, take second
curl http://localhost:3000/debug/heap-snapshot
# Wait 10 minutes, take third
curl http://localhost:3000/debug/heap-snapshot
# Download them (scp, etc.)

Open in Chrome DevTools → Memory → Load snapshots.

What I found: “Array” category growing from 45MB → 89MB → 178MB between snapshots. Doubles each interval.

Drilling into the array objects: they were Buffer instances. Many small Buffers (128-512 bytes each), all referencing the same constructor path: IncomingMessage._body → Buffer.

Step 2: The Leak Location

The three-snapshot difference view showed the growing objects all had the same retainer chain:

(root) → global → Map → Map entries → IncomingMessage → _body → Buffer

A Map on the global object was retaining IncomingMessage objects (HTTP request objects). IncomingMessage objects retain their body buffers. Body buffers grow with each request.

This was a request cache gone wrong. In the deploy 3 weeks ago, someone had added a cache for deduplication:

// The leaking code — added in the problematic deploy
const requestCache = new Map(); // GLOBAL — never cleaned
app.post('/transactions', async (req, res) => {
const idempotencyKey = req.headers['x-idempotency-key'];
if (idempotencyKey) {
if (requestCache.has(idempotencyKey)) {
const cached = requestCache.get(idempotencyKey);
return res.status(200).json(cached); // Return cached response
}
}
const result = await processTransaction(req.body);
if (idempotencyKey) {
requestCache.set(idempotencyKey, result); // NEVER DELETED
// Also accidentally retained: the entire req object!
}
res.json(result);
});

Two problems:

  1. The cache never expires — it grows forever
  2. It was accidentally storing a reference to something that held the req object

Actually the second problem wasn’t obvious. The cache stored result — but what was result? It was the processed transaction object. What did that contain?

async function processTransaction(body) {
// ...
return {
id: uuid(),
status: 'processed',
originalRequest: body, // ← HERE. Stored the entire request body.
timestamp: new Date(),
};
}

result contained originalRequest: body — a reference to the request body, which held a reference to… well, it chains. The garbage collector can’t collect req.body because result.originalRequest references it, and result is in the global requestCache, which is never cleaned.

Step 3: The Fix

// Fix 1: Use a proper TTL cache
import { LRUCache } from 'lru-cache'; // npm install lru-cache
const requestCache = new LRUCache({
max: 10_000, // Max 10k entries
ttl: 1000 * 60 * 30, // 30 minute TTL
allowStale: false,
});
// Fix 2: Don't store references to the request
async function processTransaction(body) {
return {
id: uuid(),
status: 'processed',
// REMOVED: originalRequest: body
amount: body.amount, // Store only what's needed
currency: body.currency,
timestamp: new Date(),
};
}

Memory growth stopped immediately after deploying. Heap stabilized at ~200MB.

Common Node.js Memory Leak Patterns

1. EventEmitter Listener Accumulation

// WRONG: adds a listener every time this function is called
function setupHandler() {
process.on('exit', () => {
cleanup();
});
}
// If setupHandler() is called 1000 times: 1000 listeners on process.exit
// Node.js warns: "MaxListenersExceededWarning: Possible EventEmitter memory leak"
// CORRECT: use once(), or track and remove listeners
function setupHandler() {
process.once('exit', cleanup); // Only registers once
}
// OR
const handler = () => cleanup();
emitter.on('event', handler);
// Later:
emitter.off('event', handler); // Always remove when done

2. Closures Retaining Large Variables

// WRONG: outer is captured by the returned function
function createProcessor(largeConfig) {
const outer = { config: largeConfig, cache: new Map() }; // large object
return function process(data) {
return outer.cache.get(data) || computeResult(data, outer.config);
};
}
// If you store many processors, each retains the full outer object
// CORRECT: extract only what's needed
function createProcessor(largeConfig) {
const relevantConfig = extractRelevant(largeConfig); // smaller
const cache = new Map();
return function process(data) {
return cache.get(data) || computeResult(data, relevantConfig);
};
}

3. Circular References (Less Common in Modern V8)

V8’s garbage collector handles most circular references. But circular references through native bindings or WeakMaps can sometimes cause issues:

// This is fine — V8 GC handles it
const a = {};
const b = { ref: a };
a.ref = b;
// a and b will be collected when they go out of scope
// This can leak: circular ref through a global Map
const registry = new Map();
class Connection {
constructor() {
registry.set(this, { handlers: [] });
}
// Never calls registry.delete(this)!
}
// Every Connection lives forever in registry

4. SetInterval Retaining Closures

// WRONG: The interval captures handler, which captures db connection
function startMonitoring(db) {
const interval = setInterval(() => {
db.query('SELECT 1'); // captures db in closure
}, 5000);
// interval is never cleared
// db will never be garbage collected
}
// CORRECT: always return a cleanup function
function startMonitoring(db) {
const interval = setInterval(() => {
db.query('SELECT 1');
}, 5000);
return () => clearInterval(interval); // Return cleanup
}
const stopMonitoring = startMonitoring(db);
// Later:
stopMonitoring(); // Clear interval, allow db to be GC'd

5. Forgotten setTimeout Chains

// This creates a chain that never ends:
function poll() {
checkDatabase();
setTimeout(poll, 1000); // schedules itself forever
}
poll();
// And it captures everything in the poll() scope

Diagnostic Tools

node --inspect: Enable Chrome DevTools for a running Node.js process. Memory profiler, heap snapshots, CPU profiles all available.

node --expose-gc: Exposes global.gc() in your code. Call it before taking snapshots to force a GC cycle, making the leak signal cleaner.

process.memoryUsage(): Quick programmatic check:

setInterval(() => {
const { heapUsed, heapTotal, rss, external } = process.memoryUsage();
console.log(JSON.stringify({ heapUsed, heapTotal, rss, external }));
}, 60_000);

clinic heapprofiler: From the Clinic.js toolchain. Profiles heap allocations over time, identifies allocation sites.

--max-old-space-size: Set heap limit explicitly. Faster crashes = faster feedback loop during debugging:

Terminal window
node --max-old-space-size=512 server.js # Crash at 512MB instead of OOM

The Post-Mortem

In retrospect, the leak had three contributing factors:

  1. Code review missed it: The original PR showed requestCache.set(idempotencyKey, result) — looks fine in isolation. The problem was in processTransaction’s return value structure, reviewed separately.

  2. No memory monitoring: We had CPU and HTTP metrics but no heap usage alerting. Added heapUsed/heapTotal > 80% → PagerDuty alert.

  3. No cache TTL discipline: “We’ll add expiration later” is how every unbounded cache starts. Now all caches require a TTL and max as non-optional constructor parameters in our coding standards.

Memory leaks are almost always a retention problem — something that should be released is being kept alive by an unexpected reference. The tools exist to find them. The hard part is building the habit of looking before the 3 AM alert fires.