Need help with your JSON?
Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool
Chunking Strategies for Large JSON Processing
Need a practical JSON chunking strategy for large files? The safe answer is: chunk by complete JSON values, not by arbitrary byte counts. If you have one huge JSON array or object, use a streaming parser. If you control the format, prefer NDJSON so each line can be parsed independently. What you should not do is read 1 MB at a time and call JSON.parse() on every raw chunk.
Large JSON usually fails for memory reasons long before it fails for disk size. JSON.parse() needs the entire JSON text as one complete string, and the resulting in-memory object can be much larger than the source file. That is why multi-GB exports, logs, and API responses need an incremental strategy.
Quick Decision Guide
One huge array or object
Use a streaming parser that emits tokens or complete items from the structure without materializing the whole document.
You control the format
Emit NDJSON or JSON Lines so you can read one line, parse one record, and move on.
Streaming over HTTP
Read the response body as a stream, decode incrementally, and only parse complete records that you have fully assembled.
What Counts as a Safe JSON Chunk?
This is the part many “large data JSON chunking” guides skip. A chunk is only safe to parse when it is a complete JSON value.
- One full line in an NDJSON file is a safe chunk because that line is already a complete JSON object or value.
- One object emitted by a streaming parser from a root array is a safe chunk because the parser already tracked braces, strings, escapes, and nesting for you.
- A random byte slice like “the next 1 MB of text” is not a safe chunk because it may end in the middle of a string, escape sequence, or nested object.
If you searched for a JSON chunking method, that is the core rule: split on record boundaries, not storage boundaries.
Why Standard JSON.parse() Breaks Down
JSON.parse() is all-or-nothing. It expects one complete JSON document and only returns after it has built the result in memory.
Problematic Pattern
const raw = await readEntireFile("huge-export.json");
const data = JSON.parse(raw); // Requires the whole document in memory
for (const item of data) {
await processItem(item);
}This is fine for small or moderate payloads. It is a bad fit for giant arrays, long-running exports, and vendor dumps that exceed comfortable memory limits.
1. Use a Streaming Parser for Monolithic JSON
If the input is a single large array or object and you cannot change that format, a streaming parser is the right strategy. It reads incrementally and emits tokens or matched values as soon as they are complete.
In practice, this is the right choice for files shaped like [{...}, {...}, ...] where each array item can be processed independently. Good streaming parsers can also ignore branches you do not care about, which is valuable when only one nested path matters.
Conceptual Streaming Example
This is intentionally library-agnostic. Parser APIs differ, but the processing pattern stays the same.
import { createReadStream } from "node:fs";
// import { createArrayItemStream } from "your-streaming-parser";
async function processHugeArray(filePath: string) {
const input = createReadStream(filePath);
const items = createArrayItemStream(input); // Emits one root-array item at a time
let processed = 0;
for await (const item of items) {
await processItem(item);
processed++;
}
console.log(`Processed ${processed} items`);
}The important part is not the exact library. It is that the parser, not your code, keeps track of nesting, quotes, escapes, and chunk boundaries.
2. Prefer NDJSON When You Can Control the Format
If you own the producer, NDJSON is usually the simplest and most reliable chunking strategy. Each line is a complete JSON value, so line-by-line processing becomes trivial.
Example NDJSON
{"id":1,"name":"Alice"}
{"id":2,"name":"Bob"}
{"id":3,"name":"Charlie"}This is why NDJSON is common for logs, exports, queue payloads, and large ingestion pipelines: one bad line can be isolated without invalidating an entire multi-GB document.
Node.js Example: Read and Parse One Line at a Time
This matches the current line-by-line stream pattern documented by Node.js.
import { createReadStream } from "node:fs";
import { createInterface } from "node:readline";
async function processNdjsonFile(filePath: string) {
const rl = createInterface({
input: createReadStream(filePath),
crlfDelay: Infinity,
});
let itemCount = 0;
for await (const line of rl) {
if (!line.trim()) continue;
const item = JSON.parse(line);
await processItem(item);
itemCount++;
}
console.log(`Finished ${itemCount} items`);
}Here JSON.parse() is still useful. The trick is that you only call it on one complete record at a time, not on the entire dataset.
3. Stream HTTP Responses Instead of Buffering Them
For browser-based processing, current MDN guidance is to work directly with response.body when you need incremental handling. If you call response.json() or response.text(), you give up the chance to process the data as it arrives.
Browser Example: Assemble and Parse NDJSON Records
async function processNdjsonResponse(url: string) {
const response = await fetch(url);
if (!response.body) throw new Error("ReadableStream not available");
const reader = response.body
.pipeThrough(new TextDecoderStream())
.getReader();
let buffer = "";
let itemCount = 0;
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += value;
let newlineIndex = buffer.indexOf("\n");
while (newlineIndex !== -1) {
const line = buffer.slice(0, newlineIndex).trim();
buffer = buffer.slice(newlineIndex + 1);
if (line) {
const item = JSON.parse(line);
await processItem(item);
itemCount++;
}
newlineIndex = buffer.indexOf("\n");
}
}
if (buffer.trim()) {
const item = JSON.parse(buffer);
await processItem(item);
itemCount++;
}
return itemCount;
}Notice that the code buffers text until it has a full line. It never tries to parse the raw stream chunk itself, because stream chunks are transport boundaries, not JSON record boundaries.
One more operational detail: once you start reading response.body, that body is considered consumed. Do not plan to read it once as a stream and then call response.json() afterward.
4. Why Manual Byte Chunking Is Usually the Wrong Method
Manual “read 1 MB, split, repeat” chunking sounds easy, but it breaks on real-world JSON almost immediately.
- Commas are not reliable separators because they can appear inside strings.
- Nested arrays and objects mean you must track parser state, not just delimiters.
- UTF-8 characters can span multiple bytes, so raw byte splitting can corrupt text unless decoding is done as a stream.
Unless you are building a parser yourself, manual byte chunking is just re-implementing a parser badly. Use a real streaming parser or switch the format to NDJSON.
Choosing the Right Strategy
- You receive a giant vendor dump as one JSON array: Use a streaming parser and process each item as it is emitted.
- You own the export format: Prefer NDJSON so downstream consumers can read, parse, retry, and recover record by record.
- You still run out of memory while streaming: Check whether you are accumulating processed results in arrays, caching too much, or using large concurrency limits that defeat the benefit of chunking.
- You are consuming compressed network payloads: Decompress as a stream first, then parse the decompressed text incrementally.
- You need selective extraction from deeply nested JSON: Pick a streaming parser that can filter by path instead of materializing the entire document.
Troubleshooting Large JSON Chunking
- "Unexpected end of JSON input": You are parsing an incomplete chunk. Buffer until you have a full record, or let a streaming parser manage boundaries.
- Memory is still spiking: Streaming input is not enough if you also keep every parsed item in memory after processing it.
- Text becomes garbled: Use streaming decode tools such as
TextDecoderStreamso multibyte characters are reconstructed correctly across chunk boundaries. - Bad record handling matters: NDJSON lets you quarantine a single broken line. One syntax error inside a monolithic JSON document usually invalidates the whole file.
Conclusion
The best JSON chunking strategy depends on the shape of the data, but the rule stays the same: process complete records, not arbitrary slices. Use a streaming parser for large monolithic JSON, use NDJSON when you can control the format, and use browser or Node streams to avoid buffering entire payloads. That approach is simpler, safer, and much more scalable than trying to split raw JSON by hand.
For current runtime guidance, see the MDN Fetch streaming documentation and the Node.js readline documentation.
Need help with your JSON?
Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool