When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '?']);
{"foo":"u99ac"}
Why? Have I done something wrong with my encodings?
(This is a reference question to clarify the topic once and for all, since this comes up again and again.)
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in
JavascriptECMAScript (section 7.8.4 "String Literals") and is described as such:In short: Any character can be encoded as
u....
, where....
is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "?". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP's
json_encode
preferably encodes non-ASCII characters usingu....
escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set theJSON_UNESCAPED_UNICODE
flag in PHP 5.4 or higher:To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.