Skip to content

Node.js Streams and Backpressure — Why Your File Uploads Are Slow

Posted on:September 9, 2024 at 10:00 AM

Your file upload endpoint is slow. Users are complaining about timeouts on files over 100MB. You check the code — looks straightforward. The actual problem: you’re not handling backpressure, and Node.js is buffering the entire file in memory before writing it anywhere.

Streams are one of Node.js’s most powerful features and most misunderstood abstractions. This post explains exactly how they work, what goes wrong without backpressure, and how to fix it.

Table of contents

Open Table of contents

What Are Streams?

A stream is a sequence of data processed incrementally rather than all at once. Instead of:

// Read entire file into memory → process → write
const data = await fs.readFile('/path/to/5gb-file.csv'); // 5GB in RAM
process(data);
await fs.writeFile('/output.csv', result);

You process chunk by chunk:

// Read chunk → process chunk → write chunk → repeat
const input = fs.createReadStream('/path/to/5gb-file.csv');
const output = fs.createWriteStream('/output.csv');
input.pipe(transform).pipe(output);

The second approach can process a 5GB file with <50MB of memory usage. That’s not a theoretical improvement — it’s the difference between a server that works and one that OOMs.

The Four Stream Types

Readable → produces data (fs.createReadStream, http.IncomingMessage)
Writable → consumes data (fs.createWriteStream, http.ServerResponse)
Duplex → both reads and writes (TCP sockets)
Transform → reads, transforms, writes (zlib.createGzip, crypto streams)

The pipe() Method

The simplest stream composition:

const { createReadStream, createWriteStream } = require('fs');
const { createGzip } = require('zlib');
// Compress a file: read → gzip → write
createReadStream('file.txt')
.pipe(createGzip())
.pipe(createWriteStream('file.txt.gz'));

pipe() handles backpressure automatically — that’s its main job. But what IS backpressure?

Backpressure: The Core Concept

Imagine a data pipeline:

Fast Producer → [Buffer] → Slow Consumer

If the producer generates data faster than the consumer can process it, the buffer grows unboundedly. Eventually you either run out of memory, or drop data.

Backpressure is the mechanism to tell the producer “slow down, the consumer isn’t ready”. In Node.js streams:

pipe() implements this correctly. If you write your own stream consumers, you must implement it too.

What Happens Without Backpressure

The classic mistake:

// WRONG: No backpressure handling
readable.on('data', chunk => {
writable.write(chunk); // return value ignored!
});

If the readable streams data at 500 MB/s but the writable (e.g., a database insert) processes at 50 MB/s, the writable’s internal buffer grows at 450 MB/s. On a file upload endpoint receiving concurrent requests, this will exhaust server memory.

Real-world symptoms:

Implementing Backpressure Manually

If you need to consume a stream manually (not via pipe):

async function streamToS3(readable, s3Upload) {
return new Promise((resolve, reject) => {
readable.on('data', chunk => {
const canContinue = s3Upload.write(chunk);
if (!canContinue) {
readable.pause(); // Tell the source to slow down
s3Upload.once('drain', () => {
readable.resume(); // Buffer drained, resume
});
}
});
readable.on('end', () => s3Upload.end());
readable.on('error', reject);
s3Upload.on('finish', resolve);
s3Upload.on('error', reject);
});
}

Or better — use the pipeline() utility (promise version available since Node.js 15+), which handles backpressure AND proper error cleanup:

const { pipeline } = require('stream/promises');
async function processFile(inputPath, outputPath) {
await pipeline(
fs.createReadStream(inputPath),
new Transform({ transform(chunk, enc, cb) { /* process chunk */ cb(null, chunk); } }),
fs.createWriteStream(outputPath),
);
}
// Automatically handles: backpressure, errors, cleanup of all streams

Real Example: File Upload to S3

The wrong way — buffering the entire upload in memory:

// WRONG: Loads entire file into memory
app.post('/upload', async (req, res) => {
const chunks = [];
req.on('data', chunk => chunks.push(chunk));
req.on('end', async () => {
const buffer = Buffer.concat(chunks); // Entire file in memory!
await s3.putObject({
Bucket: 'my-bucket',
Key: 'uploaded-file',
Body: buffer,
}).promise();
res.json({ success: true });
});
});

The right way — stream directly to S3:

const { Upload } = require('@aws-sdk/lib-storage');
const { pipeline } = require('stream/promises');
// CORRECT: Streams directly, S3 multipart upload
app.post('/upload', async (req, res) => {
try {
const upload = new Upload({
client: s3Client,
params: {
Bucket: 'my-bucket',
Key: `uploads/${Date.now()}-${req.headers['x-filename']}`,
Body: req, // req is a Readable stream!
ContentType: req.headers['content-type'],
},
});
const result = await upload.done();
res.json({ location: result.Location });
} catch (error) {
res.status(500).json({ error: error.message });
}
});

The Upload class from AWS SDK v3 handles S3 multipart upload internals and backpressure. You pass the HTTP request stream directly — no buffering.

Throughput Comparison: Buffered vs Streaming

Uploading a 500MB file to S3 from a Node.js server (t3.medium, 2GB RAM, 1Gbps network):

ApproachPeak memoryTime to first byte to S3Total time
Buffer then upload~520MB peak~8s (after full download)~18s
Stream with pipe()~12MB steady<1s~10s
Stream with multer (disk first)~30MB~8s~18s

The streaming approach uses 43x less memory and starts sending to S3 before the client has finished uploading. For concurrent uploads, this is the difference between a server that handles 100 concurrent 500MB uploads (50GB buffered = impossible) and one that handles them trivially.

Transform Streams: The Building Block

Transform streams are the composable piece — they read input and produce (possibly modified) output:

const { Transform } = require('stream');
class CSVToJSONTransform extends Transform {
constructor(options = {}) {
super({ ...options, objectMode: true });
this.headers = null;
this.buffer = '';
}
_transform(chunk, encoding, callback) {
this.buffer += chunk.toString();
const lines = this.buffer.split('\n');
this.buffer = lines.pop(); // Keep incomplete last line
for (const line of lines) {
if (!this.headers) {
this.headers = line.split(',');
continue;
}
const values = line.split(',');
const obj = Object.fromEntries(
this.headers.map((h, i) => [h.trim(), values[i]?.trim()])
);
this.push(obj); // Push to downstream
}
callback(); // Signal we're done processing this chunk
}
_flush(callback) {
// Process any remaining buffered data
if (this.buffer && this.headers) {
const values = this.buffer.split(',');
const obj = Object.fromEntries(
this.headers.map((h, i) => [h.trim(), values[i]?.trim()])
);
this.push(obj);
}
callback();
}
}
// Usage
await pipeline(
fs.createReadStream('large.csv'),
new CSVToJSONTransform(),
new JSONWritableStream(database),
);

The highWaterMark: Tuning Buffer Sizes

Both readable and writable streams have a configurable buffer size:

// Byte streams: highWaterMark in bytes (default: 64KB for fs.createReadStream, 16KB for generic Readable)
const readable = fs.createReadStream('file.txt', { highWaterMark: 64 * 1024 }); // 64KB (the default)
// Object mode streams: highWaterMark in number of objects (default: 16)
const transform = new Transform({
objectMode: true,
highWaterMark: 1000, // Buffer up to 1000 objects
});

For network upload endpoints, a larger highWaterMark (64KB-256KB) improves throughput by reducing the number of backpressure events. For memory-constrained environments, keep it small.

For database write streams where each “object” is a row, a highWaterMark of 1000 means you’ll batch-insert up to 1000 rows at a time — much more efficient than individual inserts.

Common Stream Pitfalls

1. Error handling gaps: pipe() doesn’t forward errors by default. Use pipeline() instead.

// WRONG: error on readStream kills the process, writeStream not closed
readStream.pipe(writeStream);
// CORRECT: errors propagate, all streams cleaned up
await pipeline(readStream, writeStream);

2. Forgetting to handle ‘close’ vs ‘end’: 'end' = no more data to read; 'close' = stream and its underlying resources are closed. For file streams, you need 'close' to know the file handle is released.

3. Mixing async/await with event emitters: Streams are event-based. Wrapping them in async functions without proper error handling creates unhandled rejections.

4. Consuming streams twice: A readable stream can only be consumed once. If you need to read the same data multiple times, either buffer it or use stream.PassThrough to fork.

Streams are Node.js at its most powerful — the reason a single-threaded process can handle multi-gigabyte files with kilobytes of memory. But they require understanding the backpressure contract. Once it clicks, you’ll never want to buffer data unnecessarily again.