My report of the Perl Toolchain Summit 2018 in Oslo
This year, to my surprise, I was again invited to the summit, on short notice.
Again, I was able to visit a city I have never been before and hack four days on YAML and other stuff.
Test::YAML/Test::Base
There was a bug in Test::Base (that also affected Test::YAML). It was confusing
CPAN::Reporter when doing a plain use Test::YAML
or use Test::Base
.
Test::Base always expects at least one test, but CPAN::Reporter only wants to check if it can load the module. This was reported by Slaven.
I fixed it to do nothing at all if there are no tests in the code.
YAML.pm problematic Regex
I had been working on this before the Summit already.
One part of YAML.pm has had a problem with a regex for a while now. Partially it seemed to be a bug in perl, since it had been fixed in newer versions.
But it still seemed to be problematic in some cases.
It's the parsing of quoted strings. According to the reported gihub issues, there was a problem if:
- The double quoted string did not have a terminating quote which resulted in what seemed like an endless loop or
- The perl was compiled with libasan-debugging enabled - two tests were running several hours
So I finally decided to try and rewrite this part. Instead of trying to parse everything in one regex, my new code would use a while loop which stops at every backslash.
As a result, strings with many escape sequences take twice of the time to be parsed, and strings with very few escape sequences only take half of the time. The unterminated string was now detected very fast instead of hanging forever.
The test with a libasan-debugging enabled perl was also much faster, but still slow enough so it should be disabled when running on such a perl.
At the summit I tried to compile perl with libasan enabled, asking around, unfortunately I couldn't get it working to reproduce the effect Ribasushi reported.
YAML.pm Trailing Comments
YAML.pm has had support for comments. But only for comments on their own line. Various trailing comments did not work like they should:
---
- "string" # comment
- 'string' # comment
- > # comment
folded block scalar
- | # comment
literal block scalar
- [ sequence ] # comment
- { x: y } # comment
These all resulted in an error.
The following ones did not, but had unexpected results:
---
- a string with a #comment?
- foo: bar #comment?
--- short text # comment?
The #comment?
would actually end up being part of the content.
I had been doing some work on that 1 year ago, but not very well tested.
I decided to start working on this again and fixed all cases and wrote tests
for them.
I hesitated to just fix this bug, so I made it optional, since I thought
people might rely on becoming #
part of unquoted content, but Ingy decided
that it should just be fixed.
I fixed these bugs and others in the pull requests:
YAML.pm $LoadBlessed
YAML::Syck allows disabling loading objects with $LoadBlessed
, and now also
YAML::XS supports this. Still missing is YAML.pm, and I started to work on
that.
One esoteric feature is that you can write to any symbol table entry when
loading YAML! Consider this harmless looking code:
use YAML;
$main::foo = 23;
my $data = Load("--- !!perl/glob { PACKAGE: main, NAME: foo, SCALAR: 42 }");
say $foo;
The output is 42!
I will also disable this along with $LoadBlessed
(although, strictly, it
doesn't have to do anything with blessing objects).
I made a pull request today, but documentation is still missing.
Perl Numbers and JSON/YAML
Since I started YAML::PP, I have been wondering about how to serialize different kind of numbers.
In Perl, a scalar cannot only be a string, integer or float, it can have a string flag and the integer flag at the same time. Same for string plus float. It can even have all three flags!
Even worse, there is not one single correct solution when deciding if something should be serialized as a string or a number.
Im my opinion, the behaviour of the newest JSON::PP and Cpanel::JSON::XS is the best. If the scalar has and int and string flag, it will be treated as an int, that has been used in a string context somewhere.
If you use a string in a numeric context, the variable will get a numeric flag added, so you'll get a number, though.
Playing around with the different JSON modules was confusing because they behaved differently, and people on IRC even reported different results for the same module! So it seemed that implementations had changed recently.
To get an overview how the different modules behave in their newest version, I started to generate an overview:
https://gist.github.com/perlpunk/35a07521b07aeea5a6c23a7d068233e7
I still have to put it into a table that is more readable, though.
When doing this, I decided to add support for Inf and NaN to YAML::PP.
YAML::PP Inf/Nan
This was quite easy to add. YAML supports Infinity and NotANumber with .inf
,
-.inf
and .nan
.
Only I made a little mistake and did not check on older perls, and promptly after I uploaded 0.006_001 Slaven reported test failures. Seems the stringification is just a bit different for older perls, something which was easy to fix.
bool.pm draft
Perl doesn't have booleans. In many many cases that's not a problem at all. You can use 1 and 0, so where's the problem? Who needs booleans?
Well, wherever you have a boolean in, say, a JSON document, you might want to keep this when writing it again after loading, because other languages actually do have booleans. Or maybe you want to validate some data in your API against a Schema, maybe via OpenAPI.
All the JSON modules do support booleans with a little trick: they bless
a reference to 1 or 0 into the class JSON::PP::Boolean
. In the past, most
of them used their own class, until finally everyone agreed to just use
one class.
So that's great. But why do I have to load a JSON module just to get booleans? Seemed obvious maybe for the JSON module authors, but when I started implementing YAML I thought it's weird for YAML::PP to load JSON::PP. But I implemented it that way, anyway.
Some day, Joel Berger mentioned on IRC that he didn't like the fact that one had to load JSON::PP just to get boolean functionality, and I agreed and thought, apparently I'm not the only one finding this weird.
So at the Summit, I tried to convince Kenichi, the current maintainer of JSON::PP, that we should have a bool.pm instead. We sat together with Joel a bit and discussed things.
Kenichi's wish is to make it the least disruptive as possible. One solution for
that is to create a bool.pm that simply pretends to be a JSON::PP::Boolean, so
it will take over the functionality of it and return JSON::PP::Boolean objects.
This way people would be able to start using bool.pm right away.
JSON::PP::Boolean also will inherit from bool by simply adding it to
@JSON::PP::Boolean::ISA
. No more loading of JSON::PP in the background, but
all checks for classname in several JSON (and YAML) modules will still work. At
some point, and that can take a while, they can switch over to just checking for
isa('bool')
. Then, some time later again, bool.pm could get independent from
JSON::PP and simply create bool
objects. But that will only work if users of
this module don't check for the exact classname, but only for the result of
isa
.
I still have to finish this draft of bool.pm and suggest it on the p5p list. Feedback welcome.
Why didn't we choose boolean.pm? boolean.pm behaves slightly different than JSON::PP. Both modules work the way they do via overloading string, numeric and boolean context. boolean.pm returns another boolean.pm object if you say
$false = ! $true;
JSON::PP::Boolean just returns the perl "boolean", so people use this to, uhm, deobjectify their booleans.
I would like to add a not
or complement
method to bool.pm that would return
the negated boolean value as an object.
Conclusion
It was great to be at the Summit. Sometimes, sitting together and talking about things can get things done much more efficiently.
Since Ingy didn't go this year, I was mostly working alone on the YAML stuff, but without the Summit, I probably wouldn't have decided to fix this old trailing comments bug in YAML.pm. While digging a bit deeper in YAML.pm during this, I found other bugs and fixed them in the last weeks.
While YAML::PP hopefully can become a replacement for YAML.pm at some point, it will still be around and getting used, so it's worth fixing those bugs.
Thanks to the crew for organizing a hacking environment with lots of space, a good network and great food!
Sponsors for the Perl Toolchain Summit 2018
Thanks very much also to the sponsors of this Summit:
NUUG Foundation, Teknologihuset, Booking.com, cPanel, FastMail, Elastic, ZipRecruiter, MaxMind, MongoDB, SureVoIP, Campus Explorer, Bytemark, Infinity Interactive, OpusVL, Eligo, Perl Services, Oetiker+Partner.
Can I see some pictures of Perl Toolchain Summit 2018?
I want to introduce Perl Toolchain Summit 2018 in my Japanese site. If there are some images, it is fun to read.
The "perl_events" account on Instagram (https://www.instagram.com/perl_events/) has photos from all the Perl events. There's a #pts2018 tag for the latest Perl Toolchain Summit, but I can't figure out how to search Instagram for that user and that tag (Instagram seems completely overrun by spam).
Thank you. I try to see instagram at first.