Node.js Memory Apocalypse: Why Your App Dies on Big Files and How to Stop It Forever
Your script works with test files, then crashes with real production data. The most common reason is loading entire files into memory instead of processing them incrementally.
This guide explains why that happens and how streams fix it.
Why It Matters
- Prevents
ENOMEMand OOM crashes in production. - Keeps memory usage predictable with huge files.
- Improves reliability for ETL, logs, and media processing.
- Supports scalable file workflows without expensive hardware.
Core Concepts
1. Why fs.readFile Breaks at Scale
fs.readFile loads the whole file into RAM before processing.
import { readFile } from "node:fs";
readFile("./mega-database.sql", "utf8", (error, data) => {
if (error) throw error;
parseSQL(data);
});
Fine for small files. Dangerous for multi-GB input.
2. Stream-Based Processing
Streams read file in chunks, process incrementally, and release memory continuously.
import { createReadStream } from "node:fs";
const stream = createReadStream("./giant-dataset.csv", { encoding: "utf8" });
stream.on("data", (chunk) => {
analyzeChunk(chunk);
});
stream.on("end", () => {
console.log("Done without memory explosion");
});
3. Backpressure-Safe Pipelines
Use pipeline for safer stream composition and centralized error handling.
import { createReadStream, createWriteStream } from "node:fs";
import { pipeline } from "node:stream";
import { createGzip } from "node:zlib";
pipeline(
createReadStream("./server.log"),
createGzip(),
createWriteStream("./server.log.gz"),
(error) => {
if (error) {
console.error("Pipeline failed", error);
return;
}
console.log("Compression completed");
}
);
4. Large JSON Handling Strategy
Prefer NDJSON or line-delimited format for streaming-friendly imports.
5. Chunk-Aware Parsing
For CSV/logs, parse line-by-line to avoid buffering huge payloads.
6. Choose the Right API
- Use
readFilefor small config/static files. - Use streams for anything that can grow beyond small memory budgets.
Practical Example
Line-by-line NDJSON import (memory-safe):
import { createReadStream } from "node:fs";
import { createInterface } from "node:readline";
async function importUsersFromNdjson(filePath) {
const fileStream = createReadStream(filePath, { encoding: "utf8" });
const reader = createInterface({ input: fileStream, crlfDelay: Infinity });
for await (const line of reader) {
if (!line.trim()) continue;
const user = JSON.parse(line);
await insertIntoDatabase(user);
}
}
Process grows with data volume, not with panic level.
Common Mistakes
- Using
readFilefor unknown or unbounded file sizes. - Parsing huge JSON arrays fully in memory.
- Ignoring backpressure and stream error propagation.
- Mixing sync file APIs in throughput-heavy code paths.
- No monitoring for heap growth during data jobs.
Quick Recap
readFileis memory-heavy by design.- Streams process data incrementally and safely.
pipelineis the robust pattern for multi-step stream flows.- NDJSON + line parsing is practical for massive imports.
- Treat large-file handling as a production reliability concern.
Next Steps
- Add memory usage logging (
process.memoryUsage) in long jobs. - Convert large JSON dumps to NDJSON for incremental processing.
- Add retry/error strategy around pipeline failures.
- Add load tests with realistic multi-GB files before release.