JSON and alternatives and an extension proposal

The genius of JSON is that it's so simple. It can even be parsed with a single regex, albeit a rather complex Perl-specific one). In a few years, it has practically taken over the world, especially since at the time people yearned for something simpler and saner than XML.

The problem with JSON is that it's too simple. It lacks features. Yesterday while working on an API that is supposed to return PNG images, I was again reminded by the fact that JSON does not handle binary data. Let's see what else JSON does not support: Inf & NaN, differentiating normal hashes vs objects, Regexps, circular references, ... (some people might want to add comments and trailing commas to that list).

There are lots of other alternative serialization formats, from the old XML (but who wants to deal with that nowadays :-) ), to YAML, to binary formats like MessagePack (last time I checked, no support for circulars though), Sereal (featureful and fast, but no JavaScript implementation so far) and BSON (also no JS implementation), to interesting approach like JSYNC (YAML-on-top-of-JSON, didn't take off apparently, I can guess some of the reasons why).

This morning I was wondering if someone has implemented something like this before: extend JSON by allowing just some more of JavaScript syntax. It's still valid, quite restricted subset of JavaScript and does *not* require full eval() to be parsed (but can be parsed by eval(), should one want to).

Additional syntax include:

1) RegExp literals: /blah/i

2) NaN and Infinity

3) for encoding binary data, something like new Buffer("\x00abc\xff") which converts a UTF-8 encoded string into binary data. Or perhaps a helper function like bin() to make it cross-platform (HTML5 browsers use Uint8Array).

4) for circular references, allow simple assignment syntax.


Douglas Crockford kept JSON simple for very good reasons that he discusses in a number of interviews. He left comments out on purpose after seeing them abused in HTML and other languages as ways of extending the language. He left out unsafe features. He has promised that the specification of JSON will never change so that you can always count on it being the same. JSON does not have and will never have version numbers.

These have all turned out to be excellent decisions and are a big reason for the success of JSON. They are not "the problem with JSON" to say the least.

Clearly, somebody needs to volunteer to implement the Sereal spec in JS. I'm a terrible JS dev, otherwise I would've done it myself already. I'll see if somebody at work wants to do it at the next hackathon. :)

There's a JavaScript implementation of BSON in the Node.js MongoDB driver. You can find a browser-targeted build of it here.

There is actually a dirty way to send images over json, Just base64 encode the image and put it in the json object like so :

perl -MMIME::Base64 -MFile::Slurp -E'say encode_base64 read_file "test.png"'

and then decode using
perl -MMIME::Base64 -MFile::Slurp -E'write_file("test.png",decode_base64("BASE 64 FOR IMAGE HERE"));'

little know fact is that you can even embed base64 encoded images directly into your img tags in html like so(doesnt work in IE)
img src=". . . "/

The problem of double-encoding is not the number of encodings but a lack of rigor and clean encodings on each level. JSON is pretty clean and rigor, so it makes more sense to defined another encoding on top of it (examples: JSON-LD or RDF/JSON) instead of hacking on the encoding level of JSON. So please don't waste effort to extend JSON but make use of it to defined your own encoding.

Leave a comment

About Steven Haryanto

user-pic A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.