It Had to be Said: XML vs. JSON

James Clark in XML vs. The Web has finally said what needed to be said -- that XML is a singularly bad format for data transmission. Here is the crux of what Mr. Clark had to say:

It's "Yay", because for important use cases JSON is dramatically better than XML. In particular, JSON shines as a programming language-independent representation of typical programming language data structures. This is an incredibly important use case and it would be hard to overstate how appallingly bad XML is for this. The fundamental problem is the mismatch between programming language data structures and the XML element/attribute data model of elements.

For me, XML is great at describing documents. XML is also (IMHO) reasonable for describing hierarchies of nodes with attributes (like config files) -- although XML is a little wordy for my tastes when used to describe hierarchies. XML, however (and you knew there had to be a however in there), is a painful format for data transmission. XML's overhead is just way too high for the simple task of transmitting common data structures; for example, the simplest XML representation I can think of for x = 1 is:

<i n="x">1</i>

which is a total of 16 Unicode characters. In JSON (if I understand the format correctly), x = 1 could be expressed as:
x: 1
for a grand total of 6 Unicode characters (including the CRLF line ending). Much better.

(James Clark wrote the XSLT spec and came up with the name XML, so his opinion on XML vs. JSON should probably be listened to.

(As a Perl programmer, I was hoping for YAML, but I think the lack of a Java YAML parser for several years led to the triumph of JSON.)

9 Comments

I can't say he's saying anything new, but I'm happy someone influential is stating the obvious. Maybe people will be listening.

XML is for documents

JSON is for data

It really is and should be that simple.

I don't disagree with anything you're saying here but I have to question this part.


the simplest XML representation I can think of for x = 1 is:

<i n="x">1</i>

Really? That's the simplest you could think of? <x>1</x> Didn't come to mind? Yes it's more verbose than x: 1 ... but far simpler than you're implying XML requires.

“Finally”? Hasn’t this been being said for, what, a decade or so?

The simplest representation in XML is <x v="1"/>. (It’s a bit longer than <x>1</x>, but it would win for any meaningful key names.)

The simplest representation in JSON in {x:1}. You cannot leave off the braces.

The difference is not huge: 4 characters (5 vs 9). And it would get overwhelmed by the payload if you used longer keys and values than 1 character each.

The problem is not with verbosity of the syntax anyway. Why does everyone doggedly fixate on that? No, the problem is how you go about accessing the data inside a program. With XML, you get a DOM. JSON maps directly onto native data structures. That is the difference.

I think the lack of a Java YAML parser for several years led to the triumph of JSON

You are mistaken. The complexity of the YAML spec is what led to the triumph of JSON. I am not sure there is even one fully compliant YAML parser for Perl on the CPAN yet. (I know that less than 2 years ago, there wasn’t.) Have you ever looked at the YAML spec? If not, you’re in for a rude shock.

In contrast, any competent programmer can write a good stab at a JSON parser for the language of his choice in a couple of hours. (Writing a really correct one is trickier than that. However, it is nowhere near as huge a task as a YAML parser.)

It’s a pity, because sure, if we were talking about a subset of YAML that’s roughly equivalent to JSON’s expressive power (YAML Tiny essentially), then I would agree: that is a great idea. But real full YAML is complex beyond sanity.

Thank you for the post, and thanks to those who commented. +5 Interesting!

It's worth mentioning that some JSON parsers (Python's, for example, IIRC) have problems with unquoted keys and values. That is: {x:1} should be {"x":"1"}, though I'm not certain about the "1" value, since it's numeric.

I agree with Aristotle. YAML is nicer to look at, but insane to parse correctly. The fact of the matter is that YAML::Tiny leads the way even though it admittingly doesn't even try to accomplish 100% of the spec. This might have changed recently since Adam Kennedy took hold of the YAML dist.

Sorry for being pedantic, but the simplest representation in JSON is {"x":1}. You can't leave off the quotation marks either, or can you?

Yes, it’s true – you have to quote the key. I forgot.

Leave a comment

About Mark Leighton Fisher

user-pic Perl/CPAN user since 1992.