Numbers and Strings in JSON
We needed to upgrade from EL6 to EL7 at work. After the upgrade, we noticed some of the JSONs returned from the APIs had changed: some numbers were suddenly enclosed in double quotes, while some others lost the quotes, which broke customers’ code written in strongly typed languages (e.g. Java).
We used the system default perl to drive our web application, using libraries provided by the vendor. While Perl’s version changed from 5.10.1 to 5.16.3, the version of JSON::XS responsible for creation of JSON data jumped from 2.27 to 3.01. Both the changes contributed to the problem.
The Internal Behaviour Change
For example, the following simple code produces different output in the mentioned environments:
#! /usr/bin/perl
use warnings;
use strict;
use JSON::XS;
my $p = $$ . "";
print encode_json([$p, $$]);
In EL6, both the values are double-quoted, but in EL7, the second one isn’t. The output of Devel::Peek shows $$ wasn’t magical in 5.10.1, but was in 5.16.3.
Another difference is the behaviour of utf8::decode
. In 5.10.1, it doesn’t change numbers into strings, but it does in 5.16.3.
use JSON::XS;
my $x = 12;
utf8::decode($x);
print encode_json({x => $x});
# 5.10.1 {"x":12}
# 5.16.3 {"x":"12"}
Don’t Use the System Perl
That’s what you hear from everyone when you encounter a problem with the system perl. It’s good advice, however it wouldn’t have helped much in this case: If we had used custom perl (through perlbrew or compiling it ourselves) we’d have just probably noticed earlier when upgrading the Perl interpreter or the JSON library.
Searching for a Solution
We discussed several ways how to fix the issue. One of the simplest ones helped us to fix the most urgent cases: when creating the Perl structure to be encoded to JSON, we forced the types by double-quoting values that were supposed to be strings, calling int on integers, and adding floats to zero.
Nevertheless, as the number of fixes grew, we realised it wasn’t the best approach. If an unwary programmer from a distant team had just inspected one of the values, its type might have changed; moreover, the change could have happened in one of the environments only. The type definitions were scattered all over the repositories, and even worse, they were redundant—we already had all the API structures documented and formally defined.
Finally, we decided to use Cpanel::JSON::XS (as it fixes some flaws of JSON::XS) and enhance it by a type-enforcing system. We then parsed our formal API definitions into the type prescription accepted by the enforcer and generated the JSONs. The syntax is simple: by using Cpanel::JSON::XS::Type, encode_json
can take a second argument which specifies how the resulting JSON should be typed:
use Cpanel::JSON::XS;
use Cpanel::JSON::XS::Type;
my $p = $$ . "";
my $type = parse('ProcessIDs'); # json_type_arrayof(JSON_TYPE_INT)
print encode_json([$p, $$], $type);
The enforcer can describe more complex structures, too:
use Cpanel::JSON::XS;
use Cpanel::JSON::XS::Type;
print encode_json(
{root => [{child => 1},
[0], [1]]},
{root => json_type_arrayof(
json_type_anyof(
json_type_hashof(JSON_TYPE_STRING),
[JSON_TYPE_BOOL]))});
# {"root":[{"child":"1"},[false],[true]]}
Benchmarks showed only a negligible slowdown. After some time, the changes were accepted to the upstream and Cpanel::JSON::XS::Type is now part of Cpanel::JSON::XS.
Related
We weren’t alone; read Did the JSON module change? for a similar story. Furthermore, JSON isn’t the only place where you can encounter these problems. Any time you communicate with a system with different arsenal of types, you might need ways to enforce them. For example, read Why does DBI implicitly change integers to strings?.
Conclusion
Perl makes no distinction between numbers and strings, they say. But is it really so?
print 1 & 12, "1" & "12"; # 01
This discrepancy was solved by the bitwise feature (introduced in 5.22):
use feature qw(bitwise);
no warnings qw(experimental::bitwise);
print "1" & "12", 1 &. 12; # 01
# ^ ^^
# numeric string
By the way, introduction of the string bitwise operators nicely shows the Perlish way: the operators decide how to interpret the operands.
Inspecting the internal flags to guess types is fragile and leads to surprising behaviour. For example, the following gives the same results for both JSON::XS and Cpanel::JSON::XS (but not JSON::PP):
my $num = 1844674407370955161;
say encode_json([$num]); # [1844674407370955161]
my $dummy = $num / 10;
say encode_json([$num]); # [1.84467440737096e+18]
Adding a second argument [JSON_TYPE_INT]
to encode_json
fixes the problem.
Thanks
Thanks to GoodData for supporting open source, to Pali for working on the solution, and to Reini for accepting it.
Leave a comment