Your file upload endpoint is slow. Users are complaining about timeouts on files over 100MB. You check the code — looks straightforward. The actual problem: you’re not handling backpressure, and Node.js is buffering the entire file in memory before writing it anywhere.
Streams are one of Node.js’s most powerful features and most misunderstood abstractions. This post explains exactly how they work, what goes wrong without backpressure, and how to fix it.
Table of contents
Open Table of contents
- What Are Streams?
- The pipe() Method
- Backpressure: The Core Concept
- What Happens Without Backpressure
- Implementing Backpressure Manually
- Real Example: File Upload to S3
- Throughput Comparison: Buffered vs Streaming
- Transform Streams: The Building Block
- The highWaterMark: Tuning Buffer Sizes
- Common Stream Pitfalls
What Are Streams?
A stream is a sequence of data processed incrementally rather than all at once. Instead of:
// Read entire file into memory → process → writeconst data = await fs.readFile('/path/to/5gb-file.csv'); // 5GB in RAMprocess(data);await fs.writeFile('/output.csv', result);You process chunk by chunk:
// Read chunk → process chunk → write chunk → repeatconst input = fs.createReadStream('/path/to/5gb-file.csv');const output = fs.createWriteStream('/output.csv');input.pipe(transform).pipe(output);The second approach can process a 5GB file with <50MB of memory usage. That’s not a theoretical improvement — it’s the difference between a server that works and one that OOMs.
The Four Stream Types
Readable → produces data (fs.createReadStream, http.IncomingMessage)Writable → consumes data (fs.createWriteStream, http.ServerResponse)Duplex → both reads and writes (TCP sockets)Transform → reads, transforms, writes (zlib.createGzip, crypto streams)The pipe() Method
The simplest stream composition:
const { createReadStream, createWriteStream } = require('fs');const { createGzip } = require('zlib');
// Compress a file: read → gzip → writecreateReadStream('file.txt') .pipe(createGzip()) .pipe(createWriteStream('file.txt.gz'));pipe() handles backpressure automatically — that’s its main job. But what IS backpressure?
Backpressure: The Core Concept
Imagine a data pipeline:
Fast Producer → [Buffer] → Slow ConsumerIf the producer generates data faster than the consumer can process it, the buffer grows unboundedly. Eventually you either run out of memory, or drop data.
Backpressure is the mechanism to tell the producer “slow down, the consumer isn’t ready”. In Node.js streams:
writable.write(chunk)returnsfalsewhen the internal buffer is full (high water mark reached)- The readable stream should pause when it sees a
falsereturn value - When the writable drains its buffer, it emits a
'drain'event - The readable stream resumes on
'drain'
pipe() implements this correctly. If you write your own stream consumers, you must implement it too.
What Happens Without Backpressure
The classic mistake:
// WRONG: No backpressure handlingreadable.on('data', chunk => { writable.write(chunk); // return value ignored!});If the readable streams data at 500 MB/s but the writable (e.g., a database insert) processes at 50 MB/s, the writable’s internal buffer grows at 450 MB/s. On a file upload endpoint receiving concurrent requests, this will exhaust server memory.
Real-world symptoms:
- Server memory grows monotonically during file uploads
- OOM kills on large file uploads
- Slow consumers (S3, databases) timeout because their buffers fill and data is queued client-side
Implementing Backpressure Manually
If you need to consume a stream manually (not via pipe):
async function streamToS3(readable, s3Upload) { return new Promise((resolve, reject) => { readable.on('data', chunk => { const canContinue = s3Upload.write(chunk);
if (!canContinue) { readable.pause(); // Tell the source to slow down
s3Upload.once('drain', () => { readable.resume(); // Buffer drained, resume }); } });
readable.on('end', () => s3Upload.end()); readable.on('error', reject); s3Upload.on('finish', resolve); s3Upload.on('error', reject); });}Or better — use the pipeline() utility (promise version available since Node.js 15+), which handles backpressure AND proper error cleanup:
const { pipeline } = require('stream/promises');
async function processFile(inputPath, outputPath) { await pipeline( fs.createReadStream(inputPath), new Transform({ transform(chunk, enc, cb) { /* process chunk */ cb(null, chunk); } }), fs.createWriteStream(outputPath), );}// Automatically handles: backpressure, errors, cleanup of all streamsReal Example: File Upload to S3
The wrong way — buffering the entire upload in memory:
// WRONG: Loads entire file into memoryapp.post('/upload', async (req, res) => { const chunks = [];
req.on('data', chunk => chunks.push(chunk)); req.on('end', async () => { const buffer = Buffer.concat(chunks); // Entire file in memory! await s3.putObject({ Bucket: 'my-bucket', Key: 'uploaded-file', Body: buffer, }).promise(); res.json({ success: true }); });});The right way — stream directly to S3:
const { Upload } = require('@aws-sdk/lib-storage');const { pipeline } = require('stream/promises');
// CORRECT: Streams directly, S3 multipart uploadapp.post('/upload', async (req, res) => { try { const upload = new Upload({ client: s3Client, params: { Bucket: 'my-bucket', Key: `uploads/${Date.now()}-${req.headers['x-filename']}`, Body: req, // req is a Readable stream! ContentType: req.headers['content-type'], }, });
const result = await upload.done(); res.json({ location: result.Location }); } catch (error) { res.status(500).json({ error: error.message }); }});The Upload class from AWS SDK v3 handles S3 multipart upload internals and backpressure. You pass the HTTP request stream directly — no buffering.
Throughput Comparison: Buffered vs Streaming
Uploading a 500MB file to S3 from a Node.js server (t3.medium, 2GB RAM, 1Gbps network):
| Approach | Peak memory | Time to first byte to S3 | Total time |
|---|---|---|---|
| Buffer then upload | ~520MB peak | ~8s (after full download) | ~18s |
| Stream with pipe() | ~12MB steady | <1s | ~10s |
| Stream with multer (disk first) | ~30MB | ~8s | ~18s |
The streaming approach uses 43x less memory and starts sending to S3 before the client has finished uploading. For concurrent uploads, this is the difference between a server that handles 100 concurrent 500MB uploads (50GB buffered = impossible) and one that handles them trivially.
Transform Streams: The Building Block
Transform streams are the composable piece — they read input and produce (possibly modified) output:
const { Transform } = require('stream');
class CSVToJSONTransform extends Transform { constructor(options = {}) { super({ ...options, objectMode: true }); this.headers = null; this.buffer = ''; }
_transform(chunk, encoding, callback) { this.buffer += chunk.toString(); const lines = this.buffer.split('\n'); this.buffer = lines.pop(); // Keep incomplete last line
for (const line of lines) { if (!this.headers) { this.headers = line.split(','); continue; } const values = line.split(','); const obj = Object.fromEntries( this.headers.map((h, i) => [h.trim(), values[i]?.trim()]) ); this.push(obj); // Push to downstream }
callback(); // Signal we're done processing this chunk }
_flush(callback) { // Process any remaining buffered data if (this.buffer && this.headers) { const values = this.buffer.split(','); const obj = Object.fromEntries( this.headers.map((h, i) => [h.trim(), values[i]?.trim()]) ); this.push(obj); } callback(); }}
// Usageawait pipeline( fs.createReadStream('large.csv'), new CSVToJSONTransform(), new JSONWritableStream(database),);The highWaterMark: Tuning Buffer Sizes
Both readable and writable streams have a configurable buffer size:
// Byte streams: highWaterMark in bytes (default: 64KB for fs.createReadStream, 16KB for generic Readable)const readable = fs.createReadStream('file.txt', { highWaterMark: 64 * 1024 }); // 64KB (the default)
// Object mode streams: highWaterMark in number of objects (default: 16)const transform = new Transform({ objectMode: true, highWaterMark: 1000, // Buffer up to 1000 objects});For network upload endpoints, a larger highWaterMark (64KB-256KB) improves throughput by reducing the number of backpressure events. For memory-constrained environments, keep it small.
For database write streams where each “object” is a row, a highWaterMark of 1000 means you’ll batch-insert up to 1000 rows at a time — much more efficient than individual inserts.
Common Stream Pitfalls
1. Error handling gaps: pipe() doesn’t forward errors by default. Use pipeline() instead.
// WRONG: error on readStream kills the process, writeStream not closedreadStream.pipe(writeStream);
// CORRECT: errors propagate, all streams cleaned upawait pipeline(readStream, writeStream);2. Forgetting to handle ‘close’ vs ‘end’: 'end' = no more data to read; 'close' = stream and its underlying resources are closed. For file streams, you need 'close' to know the file handle is released.
3. Mixing async/await with event emitters: Streams are event-based. Wrapping them in async functions without proper error handling creates unhandled rejections.
4. Consuming streams twice: A readable stream can only be consumed once. If you need to read the same data multiple times, either buffer it or use stream.PassThrough to fork.
Streams are Node.js at its most powerful — the reason a single-threaded process can handle multi-gigabyte files with kilobytes of memory. But they require understanding the backpressure contract. Once it clicks, you’ll never want to buffer data unnecessarily again.