Node.js Memory Apocalypse: Why Your App Dies on Big Files and How to Stop It Forever

Your script works with test files, then crashes with real production data. The most common reason is loading entire files into memory instead of processing them incrementally.

This guide explains why that happens and how streams fix it.

Why It Matters

Prevents ENOMEM and OOM crashes in production.
Keeps memory usage predictable with huge files.
Improves reliability for ETL, logs, and media processing.
Supports scalable file workflows without expensive hardware.

Core Concepts

1. Why `fs.readFile` Breaks at Scale

fs.readFile loads the whole file into RAM before processing.

import { readFile } from "node:fs";

readFile("./mega-database.sql", "utf8", (error, data) => {
  if (error) throw error;
  parseSQL(data);
});

Fine for small files. Dangerous for multi-GB input.

2. Stream-Based Processing

Streams read file in chunks, process incrementally, and release memory continuously.

import { createReadStream } from "node:fs";

const stream = createReadStream("./giant-dataset.csv", { encoding: "utf8" });

stream.on("data", (chunk) => {
  analyzeChunk(chunk);
});

stream.on("end", () => {
  console.log("Done without memory explosion");
});

3. Backpressure-Safe Pipelines

Use pipeline for safer stream composition and centralized error handling.

import { createReadStream, createWriteStream } from "node:fs";
import { pipeline } from "node:stream";
import { createGzip } from "node:zlib";

pipeline(
  createReadStream("./server.log"),
  createGzip(),
  createWriteStream("./server.log.gz"),
  (error) => {
    if (error) {
      console.error("Pipeline failed", error);
      return;
    }

    console.log("Compression completed");
  }
);

4. Large JSON Handling Strategy

Prefer NDJSON or line-delimited format for streaming-friendly imports.

5. Chunk-Aware Parsing

For CSV/logs, parse line-by-line to avoid buffering huge payloads.

6. Choose the Right API

Use readFile for small config/static files.
Use streams for anything that can grow beyond small memory budgets.

Practical Example

Line-by-line NDJSON import (memory-safe):

import { createReadStream } from "node:fs";
import { createInterface } from "node:readline";

async function importUsersFromNdjson(filePath) {
  const fileStream = createReadStream(filePath, { encoding: "utf8" });
  const reader = createInterface({ input: fileStream, crlfDelay: Infinity });

  for await (const line of reader) {
    if (!line.trim()) continue;
    const user = JSON.parse(line);
    await insertIntoDatabase(user);
  }
}

Process grows with data volume, not with panic level.

Common Mistakes

Using readFile for unknown or unbounded file sizes.
Parsing huge JSON arrays fully in memory.
Ignoring backpressure and stream error propagation.
Mixing sync file APIs in throughput-heavy code paths.
No monitoring for heap growth during data jobs.

Quick Recap

readFile is memory-heavy by design.
Streams process data incrementally and safely.
pipeline is the robust pattern for multi-step stream flows.
NDJSON + line parsing is practical for massive imports.
Treat large-file handling as a production reliability concern.

Next Steps

Add memory usage logging (process.memoryUsage) in long jobs.
Convert large JSON dumps to NDJSON for incremental processing.
Add retry/error strategy around pipeline failures.
Add load tests with realistic multi-GB files before release.

ZèD