Need help with your JSON?
Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool
Implementing Diff Algorithms for JSON Comparison
A useful JSON diff is not just “find every value that changed.” In practice, you usually need one of four outputs: a readable review diff, an exact equality check, a machine-applicable patch, or a CI-friendly pass/fail signal. The implementation changes depending on that goal, especially once arrays enter the picture.
That is why robust JSON comparison starts with semantics first: should object key order be ignored, should arrays be matched by position or by id, and do you need a custom change list or a standards-based patch format? Once those rules are explicit, the recursive comparison itself becomes much simpler.
Choose the Output Before the Algorithm
Start by deciding what the caller or user actually needs from the diff. Different outputs favor different algorithms and tradeoffs:
- Exact equality check: Best for regression tests and CI. Canonicalize both documents, then compare the normalized output.
- Human review diff: Return a path plus before and after values so developers can inspect the change quickly.
- Machine-applicable patch: Use RFC 6902 JSON Patch if consumers need operations like
add,remove,replace,move, ortest. - Object-heavy partial update: Use RFC 7396 JSON Merge Patch when you want a patch document that looks like the target JSON, with
nullmeaning removal.
A common mistake is trying to force one diff format to solve every problem. JSON Patch is precise and good for arrays, while Merge Patch is simple and excellent for object-shaped API payloads. They are not interchangeable.
Core Rules for Each JSON Type
At a high level, a JSON diff walks both values recursively and emits changes when their structure or content diverges:
- Primitives: Compare strings, numbers, booleans, and
nulldirectly. If the values differ, emit a replace-style change. - Objects: Compare by key set, not by source order. Keys only present on one side are adds or removes; shared keys recurse.
- Arrays: This is where most implementations fail. Index-by-index comparison is only correct for truly ordered lists. If elements have stable identifiers, match them by key first; if order does not matter, compare them as sets after normalization.
- Type changes: If a value changes type, such as object to string or array to number, treat it as a replacement of the whole subtree.
Object Key Order vs. Canonicalization
Object member order is not a reliable semantic signal in JSON. Diff objects by key presence and value, not by the order properties appeared in the original text. If you need deterministic text output for hashing, signing, or baseline files, canonicalize first. RFC 8785 JSON Canonicalization Scheme defines a deterministic representation for exactly that use case.
Implement for the Data Type, Not Just the JSON Syntax
A search query like “implementing diff for a data type” points to the real problem: JSON gives you syntax, but your domain decides the correct matching rules. The same JSON array can represent very different logical data types.
- Ordered sequences: Steps in a workflow, log entries, or playlist items should usually be diffed by position.
- Entity collections: Arrays of records with stable keys like
idorslugshould usually be indexed by that key before diffing. - Set-like values: Tags, feature flags, or permissions should often be normalized and compared as unordered values.
- Moves: Only emit explicit move operations if the consumer understands them. Otherwise, delete-plus-add is simpler and often safer.
There is no universally correct array diff. A robust implementation chooses the strategy from the business meaning of the data, not from the fact that the container happens to be JSON.
A Practical TypeScript Diff Skeleton
The following example is intentionally small, but it fixes a common mistake in simplified tutorials: arrays must be detected with Array.isArray(), not typeof value === "array". This version emits JSON Patch-like operations and keeps object and array handling separate.
type DiffOp = {
op: "add" | "remove" | "replace";
path: string;
value?: unknown;
};
function diffJson(before: unknown, after: unknown, path = ""): DiffOp[] {
if (Object.is(before, after)) {
return [];
}
if (Array.isArray(before) && Array.isArray(after)) {
return diffArray(before, after, path);
}
if (isJsonObject(before) && isJsonObject(after)) {
const ops: DiffOp[] = [];
const keys = new Set([...Object.keys(before), ...Object.keys(after)]);
for (const key of keys) {
const nextPath = `${path}/${escapePointerToken(key)}`;
if (!(key in after)) {
ops.push({ op: "remove", path: nextPath });
continue;
}
if (!(key in before)) {
ops.push({ op: "add", path: nextPath, value: after[key] });
continue;
}
ops.push(...diffJson(before[key], after[key], nextPath));
}
return ops;
}
return [{ op: "replace", path, value: after }];
}
function diffArray(before: unknown[], after: unknown[], path: string): DiffOp[] {
const ops: DiffOp[] = [];
const maxLength = Math.max(before.length, after.length);
for (let index = 0; index < maxLength; index += 1) {
const nextPath = `${path}/${index}`;
if (index >= before.length) {
ops.push({ op: "add", path: nextPath, value: after[index] });
continue;
}
if (index >= after.length) {
ops.push({ op: "remove", path: nextPath });
continue;
}
ops.push(...diffJson(before[index], after[index], nextPath));
}
return ops;
}
function isJsonObject(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null && !Array.isArray(value);
}
function escapePointerToken(token: string) {
return token.replaceAll("~", "~0").replaceAll("/", "~1");
}This uses a positional array strategy. For arrays of records, replace diffArray() with keyed matching by a stable field like id. If you need a minimal edit script, add an LCS or Myers step instead of treating each index independently.
JSON Patch vs. Merge Patch
Standards matter if your diff output leaves your process and gets applied elsewhere. The two most common choices solve different problems:
- JSON Patch (RFC 6902): Expresses a sequence of explicit operations. It can target specific array positions and supports precondition checks with
test. Use it when you need precise, replayable edits. - JSON Merge Patch (RFC 7396): Describes the desired shape by example. It is easy to read and ideal for object-centric updates, but arrays are replaced wholesale and
nullmeans deletion.
A good rule of thumb is simple: if your consumer cares about individual array edits, choose JSON Patch. If your payload is mostly nested objects and you want concise PATCH requests, Merge Patch is often easier.
JSON Diff in a CI Runner
For CI, a full custom diff engine is often unnecessary. Normalize object key order first, then diff the normalized files. The current jq documentation supports -S to sort object keys and -e for exit-status-based checks, and the jq project currently distributes standalone binaries, which makes it a practical fit for ephemeral CI runners.
jq -S . before.json > before.normalized.json jq -S . after.json > after.normalized.json if diff -u before.normalized.json after.normalized.json; then echo "No semantic JSON changes" else echo "JSON changed" exit 1 fi
If your arrays are logically unordered, normalize them too before diffing, for example by sorting objects with a stable key such as id.
Common Failure Modes
- Confusing missing and null: A missing property and a property explicitly set to
nullare different states in many APIs. - Generating invalid paths: If you emit JSON Patch, path segments must be escaped correctly before you join them.
- Index-based array diff everywhere: It creates noisy and misleading output for keyed or reorderable collections.
- Loading huge documents into memory: For very large JSON, use a streaming parser or a
--stream-style path/value pipeline instead of diffing the entire parsed tree at once. - Assuming canonical text equals business equality: Stable formatting is useful for CI and hashing, but it does not replace domain-specific rules for timestamps, IDs, or unordered collections.
Conclusion
Implementing JSON diffing well is less about one clever recursive function and more about choosing the right semantics for the data you have. Objects should usually ignore key order, arrays need an explicit matching strategy, and the output format should match the consumer.
If you only need CI verification, canonicalize and compare. If you need interoperable patches, target JSON Patch or Merge Patch deliberately. And if your domain data has real identity rules, encode those rules in the diff instead of pretending every JSON array is just a positional list.
Need help with your JSON?
Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool