Perl Regular Expression Awesomeness

This week at work I overheard some coworkers talking about a programming problem. The type that you might get in an interview. The idea was that if you had a string of words smushed together without spaces, how would you go about parsing the string into words again?

I thought about it for a bit and pretty quickly decided to load all of /usr/share/dict/words into some kind of regexp. The main difficultly is that you can't just be greedy or be nongreedy because either could fail. Imagine the inputs:

yougotmail          => you got mail
yougotmailed        => you got mailed
yougotmailman       => you got mailman (or: you got mail man)
yougotmailmanners   => you got mail manners

As you can see, regardless of greedy or nongreedy, you need backtracking. Hmm. Regular expressions have backtracking. Problem solved!

$list = join '|', map {chomp, $_} `cat /usr/share/dict/words`;
$input =~ /^($list)*$/;

That works! Only one little problem. How do I get the captured words? I thought I knew but I couldn't get it to work, so I asked Google. Google was not my friend, so I asked on #p5p. Fortunately the p5p regexp greats were around. Unfortunately they told me I couldn't really do that. At some point mauke++ suggested I could try putting code into a modern Perl regexp.

Long story short I came up with this Perl regexp gem:

You can try it out like this:

$ echo yougotmailmanners | DEBUG=1
you got mail manners

My favorite part of this is the local @stack = (@stack, $^N);. After each match we "push" the matched word (in $^N) onto a stack array; but we also localize the stack. This causes it to get reset to what we want when backtracking happens. That means there is no need for code to determine when a pop is needed.

I doubt this could be done much more elegantly in other languages. I'm sure that code invocation is supported in many newer language's regexp engines, but the local call-stack semantics don't exist because they are deemed inferior. I've written more Bash code than any other language in the last couple years. Bash has the same local semantics. It actually works out pretty nice most of the time.

I suspect only Perl has such a modern regexp engine and the "inferior" local semantics! :)

Liquid Ingy Quality Berlin

I just finished up a fun and successful Perl QA Hackathon in Berlin. This was my first such event and I'd really like to thank my new employer Liquid Web (more at the bottom) for sponsoring the event and my attendance of it!

As usual I worked on a multitude of various things that were either very interesting or needed my personal attention. The highlights were:

  • Participated in all the toolchain consensus sessions (led by xdg++)
  • Wrote the first version of an interaction /var/www/users/ingy_dot_net/index.html

Inline TPF Grant to be Finished by Christmas

OMG! That's now!!!

The Inline Grant is Finished!

Merry Christmas, Ingy and David

Tis the season to get Inline

Have you been naughty or nice?

David and I have been busy elves!

Inline Grant Nearing Completion

See our blog:

The Inline grant work is, um, working! Just a few things left to do, and we'll call it a wrap.

About Ingy döt Net

user-pic I am an Acmeist Hacker. I program in many languages to meet many people. Perl people are my favorite people. Currently I am working as a Distinguished Technologist for Hewlett Packard Enterprise; developing the future of cloud solutions.