Twin Peaks (YAML Walk with Me)

I've just returned from a 2 week (and 2 Summit!) trip to Europe, specifically Berlin and Lyon. A couple months ago I was invited to attend the Perl Toolchain Summit 2017 in Lyon. Whenever I go to the EU I like to drop by Berlin to visit some friends.

One of my friends is Tina Müller, who works with me throughout the year on various open source things. We met 2 years ago at the Perl toolchain event in Berlin, that Tina helped organize. For the past 4 moths we've been concentrating on all of YAML as a lanuguage.

Specifically we've been making:

  • The canonical YAML Test Suite for all YAML projects
  • A multi-language YAML Editor for seeing how a given YAML text works in several language implementations at once
  • The YAML Testing Matrix website for seeing how all the YAMLs work at a glance

I decided that we should have the first ever YAML Summit in Berlin along with our fabulous collaborator (and NimYAML author) from Stuttgart, Felix Krause.

Tina has written all about the YAML Summit. Check it out. Spoiler: we've started to create YAML 1.3 (the first new YAML spec in 8 years).

Next Stop, Lyon

Tina got invited to the Perl Toolchain summit as a last minute replacement for Chad Granum who couldn't make it for some personal reasons. Needless to say, I was psyched. Tina and I got to work together in person on YAML for almost 2 weeks; first on the overall YAML language itself, and then on the specific Perl/YAML ecosystem.

Perl was the very first language to have YAML, but is also one of the languages where YAML needs the most attention.

YAML.pm is Old (and I mean ::Old)

The first YAML module in Perl (and the world) is YAML.pm. It's the first (and worst) module that Perl programmers try when they want to use YAML. Since it is 16+ years old, a lot of code depends on it.

I really wanted YAML.pm to be the module that had the latest/greatest best-of-breed YAML that Perl had to offer, while breaking as little existing code as possible.

That might not seem possible (unless your name is Ingy). We came up with this plan. In a nutshell, we moved all the old YAML.pm code to YAML::Old and made use YAML; invoke that code. So nothing changes...

...unless you use YAML -new! That's not really the new syntax (but it sounded cool). The new usage looks like this:

use YAML 'yaml';

my $data = yaml->load($yaml);
$yaml = yaml->dump($data);

That's the basic usage. What does it do different? Not yet much, but it gives us the ability to make the YAML I've always wanted:

  • Single entry point to all Perl YAML backend implementations
    • Simple sugar usage like above
    • Can also use 100% plain old OO usage
  • New capabilities
    • Keep/set key orders
    • Choose exact dumper styles for every node
    • Preserve or add comments to write
  • Highly configurable
    • Lexical options and configs
    • No more global variables for config
  • Access to a YAML::DOM
  • Streaming YAML usage
  • Much more

So far we have gotten (or at least very close) the new YAML.pm working with, libyaml, Tina's new pure Perl YAML::PP parser, and Ingy's new Pegex based YAML::Pegex.

Tina and I have released YAML::Old to CPAN, and put out a developer release of the new YAML.pm. We've also made a dev release of YAML::Perl where much of the new common API code will go.

...and a 6 pack of YAML

At some point during the summit, Tina and I went out in search of some food and ran into Tony O'Dell, a Perl 6 hacker. We told him what we were up to, and a few hours (and whiskey drinks) later Tony had made a binding of libyaml to Perl 6.

In some other universe, at about the same time, Curt Tilmes did the exact same thing, but with a Dumper too. Long story short, Curt gave us his code to carry forth. Tina and I will be making this code stay perfectly in sync with our Perl 5 efforts.

A Shout Out and I'm Out

This was a really great month for me and for YAML. None of it would have happened at all without the generous support of the 2017 Perl Toolchain Summit Sponsors. Please support them like they supported us:

Booking.com, ActiveState, cPanel, FastMail, MaxMind, Perl Careers, MongoDB, SureVoIP, Campus Explorer, Bytemark, CAPSiDE, Charlie Gonzalez, Elastic, OpusVL, Perl Services, Procura, XS4ALL, Oetiker+Partner.

Finally, big hugs to all the great people who came to Lyon, and especially the organizers: Philippe, Laurent, Neil and Wendy!

Thank You and see you in TheFuture™, Ingy döt Net

Perl Regular Expression Awesomeness

This week at work I overheard some coworkers talking about a programming problem. The type that you might get in an interview. The idea was that if you had a string of words smushed together without spaces, how would you go about parsing the string into words again?

I thought about it for a bit and pretty quickly decided to load all of /usr/share/dict/words into some kind of regexp. The main difficultly is that you can't just be greedy or be nongreedy because either could fail. Imagine the inputs:

yougotmail          => you got mail
yougotmailed        => you got mailed
yougotmailman       => you got mailman (or: you got mail man)
yougotmailmanners   => you got mail manners

As you can see, regardless of greedy or nongreedy, you need backtracking. Hmm. Regular expressions have backtracking. Problem solved!

$list = join '|', map {chomp, $_} `cat /usr/share/dict/words`;
$input =~ /^($list)*$/;

That works! Only one little problem. How do I get the captured words? I thought I knew but I couldn't get it to work, so I asked Google. Google was not my friend, so I asked on #p5p. Fortunately the p5p regexp greats were around. Unfortunately they told me I couldn't really do that. At some point mauke++ suggested I could try putting code into a modern Perl regexp.

Long story short I came up with this Perl regexp gem: https://gist.github.com/ingydotnet/94528c938ca94f684270

You can try it out like this:

$ echo yougotmailmanners | DEBUG=1 word-parse.pl
you
got
mailman
mail
manners
you got mail manners

My favorite part of this is the local @stack = (@stack, $^N);. After each match we "push" the matched word (in $^N) onto a stack array; but we also localize the stack. This causes it to get reset to what we want when backtracking happens. That means there is no need for code to determine when a pop is needed.

I doubt this could be done much more elegantly in other languages. I'm sure that code invocation is supported in many newer language's regexp engines, but the local call-stack semantics don't exist because they are deemed inferior. I've written more Bash code than any other language in the last couple years. Bash has the same local semantics. It actually works out pretty nice most of the time.

I suspect only Perl has such a modern regexp engine and the "inferior" local semantics! :)

Liquid Ingy Quality Berlin

I just finished up a fun and successful Perl QA Hackathon in Berlin. This was my first such event and I'd really like to thank my new employer Liquid Web (more at the bottom) for sponsoring the event and my attendance of it!

As usual I worked on a multitude of various things that were either very interesting or needed my personal attention. The highlights were:

  • Participated in all the toolchain consensus sessions (led by xdg++)
  • Wrote the first version of an interaction /var/www/users/ingy_dot_net/index.html

Inline TPF Grant to be Finished by Christmas

OMG! That's now!!!

The Inline Grant is Finished!

Merry Christmas, Ingy and David

Tis the season to get Inline

Have you been naughty or nice?

David and I have been busy elves!

About Ingy döt Net

user-pic I am an Acmeist Hacker. I program in many languages to meet many people. Perl people are my favorite people. Currently I am working as a Distinguished Technologist for Hewlett Packard Enterprise; developing the future of cloud solutions.