Need help with your JSON?

Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool

Testing JSON Formatters with Edge Case Documents

JSON (JavaScript Object Notation) is the de facto standard for data interchange on the web and beyond. While its structure seems simple, building or using tools that handle JSON — especially JSON formatters or pretty-printers — requires careful consideration of various edge cases. A formatter's job is to take a JSON string and output a new string with consistent indentation and spacing, often for readability. However, a robust formatter must also handle invalid or unusual JSON inputs gracefully.

This article explores common and obscure JSON edge cases and discusses strategies for testing formatters to ensure they are reliable.

What is a JSON Formatter?

A JSON formatter, or pretty-printer, is a tool that takes a JSON string and re-outputs it with indentation and line breaks to make the structure clearer and easier for humans to read. For example:

Input JSON:

{"name":"Alice","age":30,"isStudent":false,"courses":["Math","Science"]}

Formatted Output:

{
  "name": "Alice",
  "age": 30,
  "isStudent": false,
  "courses": [
    "Math",
    "Science"
  ]
}

Behind the scenes, a formatter usually involves parsing the input JSON string into an in-memory data structure (like a JavaScript object or array) and then serializing that structure back into a string with the desired formatting.

Why Test with Edge Cases?

While formatting valid, standard JSON is straightforward, real-world data often contains peculiarities. Edge cases can reveal bugs in the parser or the serializer components of the formatter, leading to:

  • Incorrectly formatted output (e.g., wrong indentation, missing commas).
  • Failure to process valid JSON.
  • Lack of proper error handling for invalid JSON (e.g., crashing instead of reporting a syntax error).
  • Performance issues with large or deeply nested data.

Comprehensive testing with edge cases is crucial for building reliable JSON tools.

Common JSON Edge Cases

Invalid Syntax

These are documents that violate the JSON specification. A good formatter should ideally detect these early and report an error, rather than producing malformed output or crashing.

  • Trailing commas:
    { "a": 1, }
    or
    [ 1, 2, ]
  • Missing commas between elements:
    { "a": 1 "b": 2 }
    or
    [ 1 2 ]
  • Unquoted keys: JSON requires keys to be strings.
    { a: 1 }
  • Invalid escape sequences in strings:
    { "key": "String with bad escape \z" }
  • Comments: JSON does not support comments.
    { "a": 1 // This is a comment }
  • Using single quotes for strings: JSON requires double quotes.
    { 'key': 'value' }

Valid but Challenging Structures

These are documents that are syntactically valid JSON according to RFC 8259, but might pose challenges for formatting or performance.

  • Empty documents (not strictly JSON, but common):
  • Empty object or array:
    {}
    []
  • JSON values that are not objects or arrays: A valid JSON document can be just a string, number, boolean, or null.
    "just a string"
    123.45
    true
    null
  • Deeply nested structures:
    { "a": { "b": { "c": { "d": { "e": { "f": 1 } } } } } }
    Deep nesting can stress recursive formatting logic and potentially cause stack overflows if not handled carefully (though less common in modern runtimes).
  • Very large arrays or objects: Documents with thousands or millions of elements/keys.
    [1, 2, 3, ..., 1000000]
    { "key1": "value1", "key2": "value2", ..., "keyN": "valueN" }
    This tests performance and memory usage.
  • Mix of different types: An array or object containing a variety of value types.
    [ null, 123, "string", true, {}, [] ]

String Edge Cases

  • Strings with escaped characters: Quotes (`"`), backslashes (`\`), control characters (`\n`, `\r`, `\t`, `\f`, `\b`), and Unicode escapes (`\uXXXX`).
    { "text": "hello\nworld\twith \"quotes\" and a backslash \\" }
  • Strings with actual Unicode characters (non-ASCII): Emojis, characters from other languages.
    { "greeting": "Привет 👋" }
  • Empty strings: A valid string value.
    { "empty": "" }
  • Very long strings: Strings that are kilobytes or megabytes in size.

Number Edge Cases

  • Integers: Zero, positive, negative numbers.
    [ 0, 1, -100 ]
  • Floating-point numbers: With and without decimal parts, exponential notation.
    [ 1.0, -0.5, 1e+2, 1E-3 ]
  • Large/Small numbers: Numbers exceeding standard 64-bit float precision (should ideally be handled as strings or specific large number types internally if precision is critical, but JSON spec is flexible).
    { "large": 9223372036854775807, "small": 1e-20 }
  • Numbers with leading zeros: Invalid syntax (except for `0`).
    [ 01, 0.5 ]
  • Invalid JSON numbers: `NaN`, `Infinity`, `-Infinity`. These are not valid JSON number literals.
    [ NaN, Infinity ]

Key Edge Cases

  • Empty keys: A valid string key can be empty.
    { "": "value" }
  • Keys with special characters: Spaces, punctuation, escaped characters, Unicode.
    { "key with spaces": 1, "key/with\/slash": 2, "ключ": 3 }
  • Duplicate keys: The JSON specification says "The names within an object SHOULD be unique." Parsers/formatters might handle this differently (e.g., keep the first, keep the last, or error).
    { "a": 1, "a": 2 }

Whitespace and Encoding

  • Excessive whitespace: Leading/trailing, between tokens. A formatter should typically ignore and replace this with consistent spacing.
     { "a" : 1 } 
  • Different types of whitespace: Spaces, tabs, newlines, carriage returns.
  • Byte Order Mark (BOM): Some files encoded in UTF-8 may start with a BOM. A robust parser/formatter should handle this.

Testing Strategies

Testing JSON formatters involves ensuring correctness for valid inputs and graceful failure for invalid inputs.

1. Test Suite of Edge Cases

Create a collection of JSON strings representing all the edge cases discussed above. For valid JSON inputs, define the expected formatted output string. For invalid inputs, define the expected error type or message. Automate these tests using a testing framework.

// Example test structure (conceptual)
test('formats empty object', () => {
  const input = '{}';
  const expected = '{\n}'; // Assuming 2-space indent
  expect(formatJson(input)).toBe(expected);
});

test('formats deeply nested array', () => {
  const input = '[[[[1]]]]';
  const expected = '[\n  [\n    [\n      [\n        1\n      ]\n    ]\n  ]\n]';
  expect(formatJson(input)).toBe(expected);
});

test('throws error for trailing comma', () => {
  const input = '[1, 2,]';
  expect(() => formatJson(input)).toThrow(/trailing comma/i);
});

2. Round-Trip Testing

For valid JSON, a common technique is to ensure that parsing the formatted output yields the original data structure.

Original String → Formatter → Formatted String → Parser → Data Structure

The final data structure should be identical to the structure you would get from parsing the original input string. This validates both the formatter's serialization and a standard parser's deserialization.

3. Fuzz Testing

Generate semi-random or completely random strings, including potentially malformed JSON, and feed them to the formatter. Monitor for crashes, infinite loops, or unexpected output. Fuzzing can uncover edge cases you didn't think to include in your manual test suite.

4. Performance Testing

Measure the time and memory usage when formatting very large or deeply nested JSON documents. Ensure the formatter scales reasonably and doesn't exhaust resources for typical large inputs.

Beyond Formatting: Parser Interaction

Since formatters typically rely on an underlying JSON parser, the behavior of the formatter for invalid input is heavily dependent on the parser's robustness. Some parsers might be more lenient (e.g., accept trailing commas), while others are strict. When testing a formatter, you are implicitly testing its parser component as well. Using a standard, well-tested parser library is often the first step to a robust formatter.

Conclusion

Testing JSON formatters thoroughly with a wide range of edge case documents is essential for delivering a reliable tool. Covering invalid syntax, challenging valid structures (empty, nested, large, mixed types), string peculiarities, number representations, and key variations will help identify and fix bugs, ensuring the formatter works correctly and predictably even with the messiest real-world JSON data. A combination of dedicated edge case tests, round-trip checks, and fuzzing provides strong confidence in the formatter's robustness.

Need help with your JSON?

Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool