Starting to Learn Regexes in Perl 6

By Aaron Baugher on August 19, 2015 1:25 PM

So I was looking for a script to convert that had a regex, to play with some of the new regex stuff. I picked one that I thought was laughably simple and small, and I'm glad I did, because it still took a while to convert.

The script is a little thing I wrote to play audio/video files for me. I usually play something for background noise while I'm working (lately a lot of Rifftrax), and I got tired of picking something, so I wrote a little script that'd play the files in a directory randomly. Then I got tired of adjusting the volume because that would vary between files, so I added the ability to provide a volume for each file. (I know I should just normalize them all, but as an A/V expert, I'm a pretty good Perl coder.) Then I ran across one video where the audio I wanted wasn't the default track, so I needed to give the player an extra argument. So I added a bit of code for that.

I wound up with this regex in the Perl 5 version, which isn't too complicated; I'd guess any experienced Perl coder can interpret it in a few seconds:

if( /(.+?\.\S\S\S)\s+(\d+)(\s+)?(.+)?$/ ){
    my($m, $v, $e) = ($1, $2, $4);

It gets the filename in $m, which may have whitespace but must have a dot-3-letter ending. Then it captures the volume into $v; if there isn't a filename and a volume, it skips the line. That way I can take a file out of the list for a while if I tire of it. Then, optionally, it looks for any extra stuff, which goes into $e.

At some point, I probably should have gone with CSV or something, but it's just a dumb little script, you know? Anyway, it works fine. So here's the whole script, converted to Perl6 and working, with some notes after:

#!/usr/bin/env perl6
use v6;

my %v;                                 # hash to hold data
my token filename    { .+? \.\S\S\S }; # filenames end in .???
my token volume      { \d+ };          # any digits for volume
my regex extra       { .+ \S };        # anything following that
my $mixer          = 'mixer';
my $player         = 'mplayer -vf dsize=600:-2 -geometry +200-10 ';
my $lockfile       = '/tmp/myplayer';

$lockfile.IO.spurt( $*PID );          # store the process ID so other process can kill this one
END { $lockfile.IO.unlink; }          # remove the lockfile at end

for $=finish.lines {                  # loop through the lines below '=begin finish'
    last if /STOP/;                   # stop at a STOP line
    if m/     (<filename>)
          \s+ (<volume>)
          \s* (<extra>?) / {          # use the regexes/tokens
        my ( $m, $v, $e ) = $/[0..2]; # get captured values from $/
        if $m and $v {                # if there's a filename and volume
            %v{$m}<v> ~= $v;          #   store it in the hash
            %v{$m}<e> ~= $e // '';    #   with any extra arguments
        }}} # lisp-y to save lines
for %v.keys.pick(*) -> $m {           # loop randomly through keys
    say "Playing $m";
    print qqx{ $mixer  %v{$m}<v>      };  # set the volume
    print qqx{ $player %v{$m}<e> "$m" };  # play the file
}

# the rest is like a Perl5 __DATA__ section
=begin finish
300.avi 77
Crystal Skull Rifftrax.avi 77 -aid 2
Star Trek 5.avi 77
Star Trek 7.avi
aeon-flux.avi 93

I'm not sure I'm satisfied with the tokens/regex and the way I put them together. It works, but I'm not nearly comfortable enough with them yet to be confident that I've done it the best way. I had to make <extra> a regex so the .+ could backtrack and let the \S match. The others work fine as tokens, which I understand should be faster (not that speed is important here). But I feel like there's a lot to learn there.

I'm really liking the slurp() and spurt() routines. I used to do a lot of this kind of thing in non-production code:

my $text = `cat $filename`;     # slurp in a file
`echo $$ >$lockfile`;            # put process ID in a lockfile

Which of course didn't even pretend to be portable, and not very security-conscious either. So it's nice to have those in core, without needing to pull in a module.

I realize I'm still not being very portable with my lockfile path; I should probably be using a module to build that. That's there so that other programs can kill this one, like another script I have that alerts me to certain things.

I like putting the END {} block that unlinks the lockfile with the creation of it. Just seems logical to me, but of course it could be placed later.

The Perl5 __DATA__ has been replaced with the POD-like =begin finish, which places its lines in the $=finish object. Actually, the docs say it's been replaced with =begin data, which looks nice because, unlike __DATA__, you can have multiple sections, you can name them, and they can appear anywhere in the file, followed by a =end data line. But that's not supported yet, so I stuck with $=finish for now.

Although I started this conversion for the sake of the regex, I'm not sure there's anything to explain there. Although the new methods look more complicated, and may seem like overkill for something as small as this, they do make it clearer what you're doing. I'm searching for a filename, a volume, and an optional 'extra'. You don't even have to be able to understand the tokens/regexes themselves to understand that part.

One small note: to make the emacs Perl 6 mode happy with the syntax, I had to use m// instead of // in my if test. It's not quite as seamless as cperl-mode is with Perl 5, but it's already pretty good. I think I'll write about that next, because I'm very glad someone's already done so much with it.

One weirdness that'll take a while for long-time Perl 5 folks to get used to: captures start at $0 (or in the Match object at $/[0]), not $1.

In the for loop, I use .pick(*) on the filenames, which returns all of them in random order. The arrow assigns each one to $m. That line used to be something like:

for my $m (sort {rand <=> .5} keys %v){

So I'd say the new method is clearer.

Then I use qqx{} quoting to run the external programs. qqx{} does variable interpolation like backticks did in Perl 5. In production code, I'd turn these into Proc.run() routines, but it's just a dumb little script, right?

5 comments

Tagged as:

Perl 6, regex, token

5 Comments

Brad Gilbert | August 19, 2015 11:26 PM | Reply

#! /usr/bin/env perl6
use v6;

my @mixer          = 'mixer';

my @player         = < m6player -vf dsize=600:-2 -geometry +200-10 >;

my $lockfile       = '/tmp/myplayer';



$lockfile.IO.spurt( $*PID );

END { $lockfile.IO.unlink; }

my token filename    { .+? \.\S\S\S };

my token volume      { \d+ };

my regex extra       { .* };

my %song-data;

for $=finish.lines {

    last if /^ \s* STOP \s* $/;

    next unless  m/ \s* <filename> \s+ <volume> <extra> /;

    # $<extra> is short for $/{'extra'}

    %song-data{~$<filename>}<v e> = +$<volume>, [$<extra>.words];

}

# uses sub-signature unpacking

for %song-data.pick(*) -> ( :key($m), :value($) (:$v,:@e)) {

    say "Playing $m";

    print run( @mixer,  $v,     :out ).out.slurp-rest.indent(4);

    print run( @player, @e, $m, :out ).out.slurp-rest.indent(4);

}

=begin finish

300.avi 77

Crystal Skull Rifftrax.avi 77 -aid 2

Star Trek 5.avi 77

Star Trek 7.avi

aeon-flux.avi 93

Mike Sanders replied to comment from Brad Gilbert | August 20, 2015 3:08 AM | Reply

%song-data{~$} = +$, [$.words];

Whew.

Way too many seemingly random dollar signs for me...

Aaron Baugher replied to comment from Brad Gilbert | August 20, 2015 7:14 AM | Reply

Thanks, Brad. Named captures are nice; I need to get in the habit of using them. And the way you're breaking the values out of the hash on two levels in your second for loop is really cool. I don't think I'd seen anything like that yet.

Aaron Baugher replied to comment from Mike Sanders | August 20, 2015 7:16 AM | Reply

Mike, I didn't even realize you could use a single dollar sign like that, except in the $.method format. Something to look into, thanks.

Brad Gilbert | August 20, 2015 7:19 PM | Reply

Actually the second for loop has 3 levels of signature unpacking. You don't see the first one which is implicit.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Aaron Baugher

I'm a programmer and Unix sysadmin who uses Perl as much as possible, operating from the Midwest USA. To hire me for sysadmin or programming work, contact me at aaron.baugher @ gmail.com or as 'abaugher' on #perl6.

More info »

Aaron's Perl 6 Blog