Need help with your JSON?

Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool

Resolving Unicode Character Issues in JSON Documents

If your JSON contains a value like \u0001, the first question is whether you are looking at a valid JSON escape or an invalid raw control byte in the file. That distinction explains most real-world Unicode problems in JSON, including parser failures, mojibake, and confusing output after decoding.

In current JSON guidance from RFC 8259, Unicode text is fully allowed, but control characters U+0000 through U+001F must be escaped inside strings, and interoperable JSON exchanged between systems should be UTF-8. That means a page or tool that shows \u0001 is often working correctly, while the broken input is usually a file that contains the raw byte 0x01, a BOM, or malformed escape data.

Quick answer for \u0001 in JSON

  • \u0001 means code point U+0001, a non-printable control character named Start of Heading.
  • An escaped sequence like "A\u0001B" is valid JSON, and after parsing the string really contains that control character.
  • A raw control byte in the file is not valid JSON. That is what usually triggers parse errors such as bad control character or invalid character.
  • If you want the literal six-character text \u0001 after parsing, write "A\\u0001B" in the JSON source.

What Usually Goes Wrong

JSON Unicode issues are rarely about normal letters such as é, ß, or 世界. They usually come from one of these failures:

  • A producer writes binary or device data into a JSON string without escaping control bytes.
  • A file is saved as something other than UTF-8 and then decoded as UTF-8 later.
  • A UTF-8 BOM appears at the start of the file and a parser does not ignore it.
  • A Unicode escape is malformed, such as \u3A or \uGHIJ.
  • A non-BMP character is represented with a broken surrogate pair such as \uD83D.

Valid, invalid, and literal-text examples

// Valid JSON: the parsed string contains U+0001
{"marker":"A\u0001B"}

// Invalid JSON: the file contains a literal 0x01 byte between A and B
{"marker":"A<0x01>B"}

// Valid JSON: the parsed string contains the visible text \u0001
{"marker":"A\\u0001B"}

// Invalid Unicode escape: exactly four hex digits are required
{"marker":"\u3A"}

// Broken surrogate pair: incomplete UTF-16 escape data
{"emoji":"\uD83D"}

Current JSON Rules That Matter

RuleWhy it matters in practice
Control characters U+0000 through U+001F must be escaped in stringsA literal byte like 0x01 inside a quoted JSON string makes the document invalid.
Interoperable JSON exchanged between systems should be UTF-8If a file was written as Latin-1 or Windows-1252, non-ASCII text may decode incorrectly or fail to parse.
Generators must not add a BOM, though parsers may ignore oneA BOM at byte zero still breaks some toolchains, especially when the JSON is embedded or manually parsed.
Unicode text can appear directly in JSON stringsYou do not need to escape every non-ASCII character just to make JSON valid.
A Unicode escape must be \u followed by four hex digitsShort, truncated, or non-hex escapes are invalid JSON syntax.

How to Diagnose the File You Actually Have

A formatter or validator tells you whether the JSON is structurally valid. If it parses and still shows \u0001, that is usually an escaped control character that survived decoding correctly. If it fails before parsing, inspect the raw source.

Diagnostic checklist

  • Look at the first bytes of the file to see whether a BOM is present.
  • Check whether the bad character is written as text or stored as a raw byte. The visible sequence \u0001 is not the same thing as byte 0x01.
  • Watch for mojibake such as Café, which usually signals an encoding mismatch rather than a JSON syntax problem.
  • Check the escape itself. \u003A is valid. \u3A is not.
  • If parsing fails only in one environment, compare parser behavior. Some tools quietly ignore a BOM, while others do not.

How to Fix It Safely

1. Keep the control character if it is real data

If the downstream system truly needs U+0001, keep it escaped in JSON and let a serializer generate the output instead of hand-writing the string.

JavaScript

const marker = "A" + String.fromCharCode(1) + "B";
const json = JSON.stringify({ marker });

console.log(json);
// {"marker":"A\u0001B"}

const parsed = JSON.parse(json);
console.log(parsed.marker.charCodeAt(1));
// 1

2. Keep the visible text \u0001 instead of the control character

If you want the literal characters backslash-u-zero-zero-zero-one to appear after parsing, escape the backslash itself.

const parsed = JSON.parse('{"marker":"A\\u0001B"}');

console.log(parsed.marker);
// A\u0001B

3. Clean a dirty file before parsing

When a producer emits invalid JSON with raw control bytes, your safest fix is upstream. If you must salvage the file, treat cleanup as a repair step that may change the data.

Best-effort cleanup

function stripBomAndInvalidControls(source) {
  return source
    .replace(/^\uFEFF/, "")
    .replace(/[\u0000-\u0008\u000B\u000C\u000E-\u001F]/g, "");
}

const cleaned = stripBomAndInvalidControls(rawText);
const data = JSON.parse(cleaned);

This strips non-whitespace C0 controls globally. It is useful for emergency recovery, not as a substitute for fixing the generator that created the bad JSON.

4. Fix encoding at the file boundary

A valid JSON document can still look broken if the bytes were decoded with the wrong charset. Read and write JSON as UTF-8 unless you control a closed system with stricter legacy rules.

File handling example

import fs from "node:fs";

const text = fs.readFileSync("data.json", "utf8").replace(/^\uFEFF/, "");
const data = JSON.parse(text);

fs.writeFileSync("clean.json", JSON.stringify(data, null, 2), "utf8");

JavaScript and Python Notes

EnvironmentPractical behavior
JavaScriptJSON.parse() rejects malformed escapes and raw control characters in invalid input. MDN documents error families such as bad control character and bad Unicode escape.
Pythonjson.loads() is strict by default and rejects control characters in invalid JSON. Python also exposes strict=False for non-compliant inputs, which can help salvage bad data but does not make the source valid JSON.

Prevention Checklist

  1. Emit UTF-8 JSON without a BOM when data leaves one system for another.
  2. Use a serializer such as JSON.stringify() or json.dumps() to generate escapes correctly.
  3. Do not escape all non-ASCII text by default. Normal Unicode characters are valid JSON and are usually easier to read unescaped.
  4. Decide what should happen to control characters: preserve them, convert them to visible text, or strip them before storage and display.
  5. Validate raw files before deployment so you catch BOMs, bad escapes, and encoding errors before users do.

Conclusion

Most Unicode-related JSON bugs come down to a small set of root causes: raw control bytes, wrong file encoding, BOMs, or malformed escape sequences. Once you distinguish between a visible escape like \u0001 and an actual byte in the file, the correct fix becomes much clearer.

For search visitors landing on this page with a file that will not parse, the practical rule is simple: make the source valid UTF-8 JSON first, then decide whether the control character should stay as data, become visible text, or be removed.

Need help with your JSON?

Try our JSON Formatter tool to automatically identify and fix syntax errors in your JSON. JSON Formatter tool