Node.js Memory Leaks — How I Found and Fixed a 2GB Leak in Production

The on-call alert fired at 3 AM. Node.js process using 2.1GB RSS, climbing fast. We killed and restarted it. A week later, same thing. The service had been running for six months without this problem — something changed. This is the story of finding and fixing it.

Open Table of contents

Symptoms and Initial Investigation
Step 1: Taking Heap Snapshots
Step 2: The Leak Location
Step 3: The Fix
Common Node.js Memory Leak Patterns
Diagnostic Tools
The Post-Mortem
Related posts

Symptoms and Initial Investigation

The service: a transaction processing API, ~200 req/s, Node.js 18, Express, PostgreSQL (pg pool), Redis (ioredis).

Symptoms:

Memory grows ~50MB/hour under normal load
Process crashes (OOM kill) after 5-7 days
No memory growth during low-traffic periods
Started approximately 3 weeks ago (correlates with a deploy)

Initial checks:

# Check current memory
curl http://localhost:3000/health | jq '.memory'
# {"heapUsed": 890MB, "heapTotal": 1100MB, "rss": 2100MB, "external": 850MB}

# heapUsed is extremely high — objects accumulating on the V8 heap
# The gap between heapUsed and rss includes V8 overhead and native allocations

The high heapUsed (890MB) pointed toward JavaScript objects accumulating — likely Maps, closures, or cached data holding references to request/response bodies.

Step 1: Taking Heap Snapshots

// Add to your Express app temporarily
const { writeHeapSnapshot } = require('v8');

app.get('/debug/heap-snapshot', (req, res) => {
  const filename = writeHeapSnapshot(); // writes to CWD
  res.json({ filename });
});

The three-snapshot technique:

Snapshot 1: After startup (baseline)
Snapshot 2: After some load (5-10 minutes)
Snapshot 3: After more load (same interval)

Objects that grow from S1 → S2 → S3 are the leak.

# Take snapshots
curl http://localhost:3000/debug/heap-snapshot
# {"filename":"Heap-20240315T140523Z-98765.heapsnapshot"}

# Wait 10 minutes under load, take second
curl http://localhost:3000/debug/heap-snapshot

# Wait 10 minutes, take third
curl http://localhost:3000/debug/heap-snapshot

# Download them (scp, etc.)

Open in Chrome DevTools → Memory → Load snapshots.

What I found: “Array” category growing from 45MB → 89MB → 178MB between snapshots. Doubles each interval.

Drilling into the array objects: they were Buffer instances. Many small Buffers (128-512 bytes each), all referencing the same constructor path: IncomingMessage._body → Buffer.

Step 2: The Leak Location

The three-snapshot difference view showed the growing objects all had the same retainer chain:

(root) → global → Map → Map entries → IncomingMessage → _body → Buffer

A Map on the global object was retaining IncomingMessage objects (HTTP request objects). IncomingMessage objects retain their body buffers. Body buffers grow with each request.

This was a request cache gone wrong. In the deploy 3 weeks ago, someone had added a cache for deduplication:

// The leaking code — added in the problematic deploy
const requestCache = new Map(); // GLOBAL — never cleaned

app.post('/transactions', async (req, res) => {
  const idempotencyKey = req.headers['x-idempotency-key'];

  if (idempotencyKey) {
    if (requestCache.has(idempotencyKey)) {
      const cached = requestCache.get(idempotencyKey);
      return res.status(200).json(cached); // Return cached response
    }
  }

  const result = await processTransaction(req.body);

  if (idempotencyKey) {
    requestCache.set(idempotencyKey, result); // NEVER DELETED
    // Also accidentally retained: the entire req object!
  }

  res.json(result);
});

Two problems:

The cache never expires — it grows forever
It was accidentally storing a reference to something that held the req object

Actually the second problem wasn’t obvious. The cache stored result — but what was result? It was the processed transaction object. What did that contain?

async function processTransaction(body) {
  // ...
  return {
    id: uuid(),
    status: 'processed',
    originalRequest: body, // ← HERE. Stored the entire request body.
    timestamp: new Date(),
  };
}

result contained originalRequest: body — a reference to the request body, which held a reference to… well, it chains. The garbage collector can’t collect req.body because result.originalRequest references it, and result is in the global requestCache, which is never cleaned.

Step 3: The Fix

// Fix 1: Use a proper TTL cache
import { LRUCache } from 'lru-cache'; // npm install lru-cache

const requestCache = new LRUCache({
  max: 10_000,           // Max 10k entries
  ttl: 1000 * 60 * 30,  // 30 minute TTL
  allowStale: false,
});

// Fix 2: Don't store references to the request
async function processTransaction(body) {
  return {
    id: uuid(),
    status: 'processed',
    // REMOVED: originalRequest: body
    amount: body.amount,   // Store only what's needed
    currency: body.currency,
    timestamp: new Date(),
  };
}

Memory growth stopped immediately after deploying. Heap stabilized at ~200MB.

Common Node.js Memory Leak Patterns

1. EventEmitter Listener Accumulation

// WRONG: adds a listener every time this function is called
function setupHandler() {
  process.on('exit', () => {
    cleanup();
  });
}

// If setupHandler() is called 1000 times: 1000 listeners on process.exit
// Node.js warns: "MaxListenersExceededWarning: Possible EventEmitter memory leak"

// CORRECT: use once(), or track and remove listeners
function setupHandler() {
  process.once('exit', cleanup); // Only registers once
}

// OR
const handler = () => cleanup();
emitter.on('event', handler);
// Later:
emitter.off('event', handler); // Always remove when done

2. Closures Retaining Large Variables

// WRONG: outer is captured by the returned function
function createProcessor(largeConfig) {
  const outer = { config: largeConfig, cache: new Map() }; // large object
  return function process(data) {
    return outer.cache.get(data) || computeResult(data, outer.config);
  };
}

// If you store many processors, each retains the full outer object

// CORRECT: extract only what's needed
function createProcessor(largeConfig) {
  const relevantConfig = extractRelevant(largeConfig); // smaller
  const cache = new Map();
  return function process(data) {
    return cache.get(data) || computeResult(data, relevantConfig);
  };
}

3. Circular References (Less Common in Modern V8)

V8’s garbage collector handles most circular references. But circular references through native bindings or WeakMaps can sometimes cause issues:

// This is fine — V8 GC handles it
const a = {};
const b = { ref: a };
a.ref = b;
// a and b will be collected when they go out of scope

// This can leak: circular ref through a global Map
const registry = new Map();
class Connection {
  constructor() {
    registry.set(this, { handlers: [] });
  }
  // Never calls registry.delete(this)!
}
// Every Connection lives forever in registry

4. SetInterval Retaining Closures

// WRONG: The interval captures handler, which captures db connection
function startMonitoring(db) {
  const interval = setInterval(() => {
    db.query('SELECT 1'); // captures db in closure
  }, 5000);
  // interval is never cleared
  // db will never be garbage collected
}

// CORRECT: always return a cleanup function
function startMonitoring(db) {
  const interval = setInterval(() => {
    db.query('SELECT 1');
  }, 5000);

  return () => clearInterval(interval); // Return cleanup
}

const stopMonitoring = startMonitoring(db);
// Later:
stopMonitoring(); // Clear interval, allow db to be GC'd

5. Forgotten setTimeout Chains

// This creates a chain that never ends:
function poll() {
  checkDatabase();
  setTimeout(poll, 1000); // schedules itself forever
}
poll();

// And it captures everything in the poll() scope

Diagnostic Tools

node --inspect: Enable Chrome DevTools for a running Node.js process. Memory profiler, heap snapshots, CPU profiles all available.

node --expose-gc: Exposes global.gc() in your code. Call it before taking snapshots to force a GC cycle, making the leak signal cleaner.

process.memoryUsage(): Quick programmatic check:

setInterval(() => {
  const { heapUsed, heapTotal, rss, external } = process.memoryUsage();
  console.log(JSON.stringify({ heapUsed, heapTotal, rss, external }));
}, 60_000);

clinic heapprofiler: From the Clinic.js toolchain. Profiles heap allocations over time, identifies allocation sites.

--max-old-space-size: Set heap limit explicitly. Faster crashes = faster feedback loop during debugging:

node --max-old-space-size=512 server.js  # Crash at 512MB instead of OOM

The Post-Mortem

In retrospect, the leak had three contributing factors:

Code review missed it: The original PR showed requestCache.set(idempotencyKey, result) — looks fine in isolation. The problem was in processTransaction’s return value structure, reviewed separately.
No memory monitoring: We had CPU and HTTP metrics but no heap usage alerting. Added heapUsed/heapTotal > 80% → PagerDuty alert.
No cache TTL discipline: “We’ll add expiration later” is how every unbounded cache starts. Now all caches require a TTL and max as non-optional constructor parameters in our coding standards.

Memory leaks are almost always a retention problem — something that should be released is being kept alive by an unexpected reference. The tools exist to find them. The hard part is building the habit of looking before the 3 AM alert fires.

Node.js Diagnostic Tools — Heap Snapshots, Flame Graphs, and DoctorJS in 2026 — the full reference guide to every diagnostic tool used in this case study