August 2015 Archives

A Video-to-Song Converter in Perl 6

Here's my longest Perl 6 script yet. I'm trying to come up with some shorter ideas for future articles.

I have a directory full of music videos in MP4 format, and wanted to convert the audio to MP3 files. I also wanted to insert the title and artist, which are generally found in the filenames, into the ID3 fields in the MP3 file. However, there was a sticking point: the filenames are inconsistent. The only pattern is that they all end in 12 junk characters and a ".mp4" extension. Other than that, some are "title - artist", some are "artist - title", some are one of those without the hyphen or anything to separate the two, some have other bits of text stuck in here and there, a few don't have the artist at all, and so on.

So I knew I couldn't fully automate it. But I thought there had to be a better solution than renaming them all by hand and typing in all the correct data. What if I wrote a script that did a best guess of what the fields should be, showed them to me, let me edit them, and then used that to create the files? That led to the script below. Notes after.

#!/usr/bin/env perl6
use v6;

sub MAIN() {                                               #1
    my %s;
  SONG:
    for dir.grep: / . ** 12 \.mp4 $/ -> $l {               #2
        my $n = $l.substr(0,*-16);                         #3
        my ($a, $t) = $n.split( rx| \s* \- \s* |, 2 );
        unless $t and $a {
            ($a, $t) = $n.split( rx| \s+ |, 2);            #4
            $t = $a unless $t;
        }
        my %h = ( title   => $t,
                  artist  => $a,
                  album   => 'Downloads',
                  genre   => 'Rock',
                  comment => "Converted by $*PROGRAM-NAME",  #5
              );
        loop {
            %h<newfile> = "%h<artist> - %h<title>.mp3";      #6
            print-song($l, %h);
            my $p = prompt('Type a letter and new value: ');
            my ($c,$r) = $p.split( rx| \s+ |, 2);
            given lc $c {                                    #7
                when 'a' { %h<artist>       = $r }
                when 't' { %h<title>        = $r }
                when 'l' { %h<album>        = $r }
                when 'g' { %h<genre>        = $r }
                when 'c' { %h<comment>      = $r }
                when 'w' { %h<title artist> = %h<artist title> }
                when 'h' { show-help }
                when 'n' { next SONG }
                when 'x' { exit }
                when 's' { last }
                when 'p' {
                    process-saved-songs(%s);
                    next;
                }
            }
        }
        %s{$l} = %h;
    }
    process-saved-songs(%s);
}

my &process-saved-songs = sub (%s is rw){                      #8
    for %s.kv -> $k, %v {                                      #9
        my $p1 = run(< ffmpeg -i >, $k, < -ab 96k >, "new/%v<newfile>", :out );  #10
        say $p1.out.slurp-rest;
        my $p2 = run('id3v2',
                     '-a', %v<artist>,
                     '-t', %v<title>,
                     '-A', %v<album>,
                     '-g', %v<genre>,
                     '-c', %v<comment>,
                     "new/%v<newfile>", :out);
        say $p2.out.slurp-rest;
        rename( $k, "./done/$k" ) or die $!;             #11
        %s{$k}:delete;
    }
}

my &print-song = sub ($l, %h) {
    say qq:to/END/;                                      #12

    Old Filename: $l
    New Filename: %h<newfile>

    Artist:   %h<artist>
    Title:    %h<title>
    aLbum:    %h<album>
    Genre:    %h<genre>
    Comment:  %h<comment>

    w: sWap Title and Artist      s: Save song info         h: Help
    n: Next song without saving   p: Process saved songs    x: Exit      
    END
}

my &show-help = sub (){
    say q:to/HELP/;
===========================================================
This program tries to determine what the title and artist of a song should be
from its filename, lets you edit those and other ID3 fields, saves a list of
files to process, then processes them by converting them to MP3 format and
setting their ID3 fields.

Type one letter, then if it needs a value, type that after a space
and hit enter.

Examples:

t Dirty Deeds           -> Changes the song title to "Dirty Deeds"
a AC-DC                 -> Changes the artist to "AC-DC"
l Back in Black         -> Changes the album to "Back in Black"
g Rock                  -> Changes the genre to Rock
c Favorite song         -> Sets the comment field to "Favorite song"
w                       -> Swaps the artist and title fields        
h                       -> Displays this help
s                       -> Saves the song info for processing
p                       -> Processes all saved files
n                       -> Skips to the next song, ignoring this one
x                       -> Exit
===========================================================

HELP
    prompt("Hit enter to return to songs: ");
}

1) I don't really need a MAIN sub, since I don't have any command-line variables. But it's always possible that I'll add some later, so this way I'm prepared. Also, it lets me put %s inside its scope, rather than having it exist in the entire file.

2) There's some interesting stuff going on here. First, dir() replaces all the opendir/readdir stuff we used to have to do. Then you can see how method calls like grep can use the "grep: argument" notation. I kinda like that; I may have to use that more often. The regex here is interesting because it uses .**12 where it would have been .{12} before. By the way, a big thanks to whoever put all the hints in the perl6 compiler that say, "It looks like you're trying to do [insert perl5 thing]; maybe you should try this." Those are a big time-saver.

3) substr() has changed in some ways. Now instead of using a negative number to count from the end of the string, you use *-n to say "end of string minus n". It makes sense. I did notice one problem: if the string isn't that long, you get an error, because then it ends up with a negative number after all, which it doesn't like. So I guess if there's a possibility of a shorter string, you have to adjust for that.

4) I expected to use the new words() method here, but couldn't, because it turns out words() and split() work differently when you give them a limit. split() stuffs all the remaining text in the last element, while words() drops whatever doesn't fit. For instance:

"The quick brown fox".split( rx| \s+ |, 2 ); #= ('The', 'quick brown fox')   
"The quick brown fox".words(2);              #= ('The', 'quick')

So split() it is this time, unless there's an option to words() that I didn't see.

5) Most of the old short-named special variables are gone. The old $0 has become $*PROGRAM-NAME. That's probably not a bad thing; I often had to look up the ones I didn't use regularly anyway. Now $0 is the first captured match from a regex.

6) I'm still getting used to the new look of hash variables. I still catch myself putting $ in front of them, but I'm getting better. The quoting when you don't want interpolation is nice and clear.

7) given/when makes for a nice, clear table of actions here.

8) In Perl 5, if you passed a hash (or array) to a sub, it got flattened out into a list and copies of the values passed, so if you wanted the actual thing passed, you passed a reference. Perl 6 pretty much passes a reference by default, but it also makes it read-only. So to let my sub clean out the hash after doing its work on it, I needed to specify it as rw (read-write).

9) I'm noticing that Perl 6 is even better than 5 at just doing what you mean. Here I'm pulling key and value pairs from a hash, where the values are hash references. In 5, I would have needed to deference those, so I probably would have had something like $s->{$k}{newfile} all over the place. Here I just say, "give me that as a hash," and away I go. I suppose eventually I'll run into a situation where I wish it were more literal and less helpful, but I think that will be rare.

10) I'm not thrilled with the ugliness of this list of arguments. Guess that's why I went a different route two lines later.

11) I tried to use move() first, but it's not implemented yet, so rename() worked.

12) This is the new heredoc syntax. It has some nice features, one being that it's smart about indentation. Here, since END is indented four spaces, it'll strip four spaces from the beginning of the other lines as well. The qq:to/END/; syntax gives cperl-mode in emacs fits, though; I'll have to submit an issue on that.

For details on how to actually use it, run it and hit 'h' for help. Basically, it takes a guess at the file's title and artist, and displays them for you along with some other fields that you can edit. One idea I had was to provide a sWap option that swaps the title and artist fields, since about 1/3 of the files had them backwards from the rest. Once I like a file's info, I hit 's' to save it. It doesn't go ahead and process the file right away, because that might take a while and be annoying. So it saves the info in %s, and doesn't process the files until you hit 'p' to process all saved ones, or when it runs out of files.

It's not really intended to be portable or anything yet, and it assumes a lot of things: that your files have a certain format and extension, that certain directories exist, that you have ffmpeg and id3v2 installed, etc. But feel free to use it, expand on it, laugh at it, curse it, whatever you like.

Starting to Learn Regexes in Perl 6

So I was looking for a script to convert that had a regex, to play with some of the new regex stuff. I picked one that I thought was laughably simple and small, and I'm glad I did, because it still took a while to convert.

The script is a little thing I wrote to play audio/video files for me. I usually play something for background noise while I'm working (lately a lot of Rifftrax), and I got tired of picking something, so I wrote a little script that'd play the files in a directory randomly. Then I got tired of adjusting the volume because that would vary between files, so I added the ability to provide a volume for each file. (I know I should just normalize them all, but as an A/V expert, I'm a pretty good Perl coder.) Then I ran across one video where the audio I wanted wasn't the default track, so I needed to give the player an extra argument. So I added a bit of code for that.

I wound up with this regex in the Perl 5 version, which isn't too complicated; I'd guess any experienced Perl coder can interpret it in a few seconds:

if( /(.+?\.\S\S\S)\s+(\d+)(\s+)?(.+)?$/ ){
    my($m, $v, $e) = ($1, $2, $4);

It gets the filename in $m, which may have whitespace but must have a dot-3-letter ending. Then it captures the volume into $v; if there isn't a filename and a volume, it skips the line. That way I can take a file out of the list for a while if I tire of it. Then, optionally, it looks for any extra stuff, which goes into $e.

At some point, I probably should have gone with CSV or something, but it's just a dumb little script, you know? Anyway, it works fine. So here's the whole script, converted to Perl6 and working, with some notes after:

#!/usr/bin/env perl6
use v6;

my %v;                                 # hash to hold data
my token filename    { .+? \.\S\S\S }; # filenames end in .???
my token volume      { \d+ };          # any digits for volume
my regex extra       { .+ \S };        # anything following that
my $mixer          = 'mixer';
my $player         = 'mplayer -vf dsize=600:-2 -geometry +200-10 ';
my $lockfile       = '/tmp/myplayer';

$lockfile.IO.spurt( $*PID );          # store the process ID so other process can kill this one
END { $lockfile.IO.unlink; }          # remove the lockfile at end

for $=finish.lines {                  # loop through the lines below '=begin finish'
    last if /STOP/;                   # stop at a STOP line
    if m/     (<filename>)
          \s+ (<volume>)
          \s* (<extra>?) / {          # use the regexes/tokens
        my ( $m, $v, $e ) = $/[0..2]; # get captured values from $/
        if $m and $v {                # if there's a filename and volume
            %v{$m}<v> ~= $v;          #   store it in the hash
            %v{$m}<e> ~= $e // '';    #   with any extra arguments
        }}} # lisp-y to save lines
for %v.keys.pick(*) -> $m {           # loop randomly through keys
    say "Playing $m";
    print qqx{ $mixer  %v{$m}<v>      };  # set the volume
    print qqx{ $player %v{$m}<e> "$m" };  # play the file
}

# the rest is like a Perl5 __DATA__ section
=begin finish
300.avi 77
Crystal Skull Rifftrax.avi 77 -aid 2
Star Trek 5.avi 77
Star Trek 7.avi
aeon-flux.avi 93

I'm not sure I'm satisfied with the tokens/regex and the way I put them together. It works, but I'm not nearly comfortable enough with them yet to be confident that I've done it the best way. I had to make <extra> a regex so the .+ could backtrack and let the \S match. The others work fine as tokens, which I understand should be faster (not that speed is important here). But I feel like there's a lot to learn there.

I'm really liking the slurp() and spurt() routines. I used to do a lot of this kind of thing in non-production code:

my $text = `cat $filename`;     # slurp in a file
`echo $$ >$lockfile`;            # put process ID in a lockfile

Which of course didn't even pretend to be portable, and not very security-conscious either. So it's nice to have those in core, without needing to pull in a module.

I realize I'm still not being very portable with my lockfile path; I should probably be using a module to build that. That's there so that other programs can kill this one, like another script I have that alerts me to certain things.

I like putting the END {} block that unlinks the lockfile with the creation of it. Just seems logical to me, but of course it could be placed later.

The Perl5 __DATA__ has been replaced with the POD-like =begin finish, which places its lines in the $=finish object. Actually, the docs say it's been replaced with =begin data, which looks nice because, unlike __DATA__, you can have multiple sections, you can name them, and they can appear anywhere in the file, followed by a =end data line. But that's not supported yet, so I stuck with $=finish for now.

Although I started this conversion for the sake of the regex, I'm not sure there's anything to explain there. Although the new methods look more complicated, and may seem like overkill for something as small as this, they do make it clearer what you're doing. I'm searching for a filename, a volume, and an optional 'extra'. You don't even have to be able to understand the tokens/regexes themselves to understand that part.

One small note: to make the emacs Perl 6 mode happy with the syntax, I had to use m// instead of // in my if test. It's not quite as seamless as cperl-mode is with Perl 5, but it's already pretty good. I think I'll write about that next, because I'm very glad someone's already done so much with it.

One weirdness that'll take a while for long-time Perl 5 folks to get used to: captures start at $0 (or in the Match object at $/[0]), not $1.

In the for loop, I use .pick(*) on the filenames, which returns all of them in random order. The arrow assigns each one to $m. That line used to be something like:

for my $m (sort {rand <=> .5} keys %v){

So I'd say the new method is clearer.

Then I use qqx{} quoting to run the external programs. qqx{} does variable interpolation like backticks did in Perl 5. In production code, I'd turn these into Proc.run() routines, but it's just a dumb little script, right?

Mystery Line in Proc Input in Perl 6

Here's something odd; not sure if it's a bug or just something I don't understand.

I have a utility on my system called k8temp which reports the temperature of the CPUs. It reports it in Celsius, so I thought I'd write a little wrapper that converts the temps to Fahrenheit. k8temp outputs one line per core, so on my dual-core system, the output looks like this:

abaugher@bannor> k8temp
CPU 0 Core 0 Sensor 0: 38c
CPU 0 Core 1 Sensor 0: 38c
abaugher@bannor>

And piping it to a hex dump looks like this:

abaugher@bannor> k8temp|hd
00000000: 43 50 55 20, 30 20 43 6f, 72 65 20 30, 20 53 65 6e ;CPU 0 Core 0 Sen
00000010: 73 6f 72 20, 30 3a 20 33, 39 63 0a 43, 50 55 20 30 ;sor 0: 39c.CPU 0
00000020: 20 43 6f 72, 65 20 31 20, 53 65 6e 73, 6f 72 20 30 ; Core 1 Sensor 0
00000030: 3a 20 33 38, 63 0a                                 ;: 38c.

So there are definitely just two lines there, right? But when I run my script, I get three lines: the two expected lines and then a blank one. Here it is:

#!/usr/bin/env perl6
use v6;

sub MAIN() {          
    my $p = run 'k8temp', :out;
    for $p.out.lines {
        if / (.*?) (\d+) c / {
            say  [~] $0, round($1*9/5+32), "°";
        } else {
            say 'Mystery line: ',$_;
        }
    }
}

And the output:

abaugher@bannor> ./k8temp.p6
CPU 0 Core 0 Sensor 0: 108°
CPU 0 Core 1 Sensor 0: 106°
Mystery line: 
abaugher@bannor> ./k8temp.p6|hd
00000000: 43 50 55 20, 30 20 43 6f, 72 65 20 30, 20 53 65 6e ;CPU 0 Core 0 Sen
00000010: 73 6f 72 20, 30 3a 20 31, 30 39 c2 b0, 0a 43 50 55 ;sor 0: 109°.CPU
00000020: 20 30 20 43, 6f 72 65 20, 31 20 53 65, 6e 73 6f 72 ; 0 Core 1 Sensor
00000030: 20 30 3a 20, 31 30 34 c2, b0 0a 4d 79, 73 74 65 72 ; 0: 104°.Myster
00000040: 79 20 6c 69, 6e 65 3a 20, 0a                       ;y line: .

So the for loop is running three times, even though the subprogram only outputs two lines, and the value of $_ the third time through is the empty string. This will require more investigation. Am I missing something obvious here?

Update: I tried a couple things. First, slurping the output of the process shows no third line:

#!/usr/bin/env perl6
use v6;

sub MAIN() {          
    my $p = run 'k8temp', :out;
    print $p.out.slurp-rest;
}
# results
abaugher@bannor> ./k8.p6|hd 
00000000: 43 50 55 20, 30 20 43 6f, 72 65 20 30, 20 53 65 6e ;CPU 0 Core 0 Sen
00000010: 73 6f 72 20, 30 3a 20 34, 31 63 0a 43, 50 55 20 30 ;sor 0: 41c.CPU 0
00000020: 20 43 6f 72, 65 20 31 20, 53 65 6e 73, 6f 72 20 30 ; Core 1 Sensor 0
00000030: 3a 20 33 39, 63 0a                                 ;: 39c.

Then using .lines instead:

#!/usr/bin/env perl6
use v6;

sub MAIN() {          
    my $p = run 'k8temp', :out;
    my @l = $p.out.lines;
    @l.perl.say;
}
# results
abaugher@bannor> ./k8.p6
["CPU 0 Core 0 Sensor 0: 42c", "CPU 0 Core 1 Sensor 0: 40c", ""]<>

So .lines() appears to be adding a third element. Now to figure out why.

Benchmarking index() and regex in Perl 6

I noticed Perl 6 has a Benchmark module already, so I was wanting to use it, and Liz's suggestion of using index() rather than a regex in my last script gave me an excuse. The results were striking.

The script and results are below. Benchmark.pm6 doesn't have a cmpthese() routine, but timethese() does well enough. Below is the script, then the average times required for one grep through the array of about 150 lines. (I ran the script five times and averaged those times in the bottom row of the table.)

What did I learn?

Well, for starters, index() is at least 10 times faster than the best regex solution, and 100 times better than my first attempt. So that's the way to go, whenever possible.

Comparing the regexes was interesting too, though, so I ended up trying several things. Putting the regex in the grep with a bare variable (regex1) was terrible. Replacing the variable with a constant (regex2) was much faster, but that's not usually an option in a real program. The next thing I tried was actually regex4, creating a regex object outside the loop. I was a little surprised that that didn't gain anything over regex1. I guess since it can't know for sure that $string will never change, it still has to reinterpolate it every time.

So then I tried regex5, and wasn't surprised to see it fast again with the constant. Then I thought of regex6: putting quotes around the variable in the regex object, so it would go ahead and interpolate it. That sped it up a lot, though not as much as the constant. And if I print out $r6, it shows it just as it is there, so it hasn't forgotten it's a variable.

That led me to try regex3 and discover that quoting the variable inside the grep test gains the same thing.

So creating the rx// object in advance didn't gain anything; in fact regex{456} are slightly slower than regex{123}. What made the difference was putting quotes around $string. And I'm a little puzzled why that would be. Maybe it's time to read some more of the Synopsis on regexes and see if it enlightens me.

#!/usr/bin/env perl6
use v6;
use Benchmark;

my $p = run 'ps', 'auxww', :out; 
my $header = $p.out.get;
my @lines = $p.out.lines;

my $string = 'xterm';
my $r4 = rx{  $string  };
my $r5 = rx{   xterm   };
my $r6 = rx{ "$string" };

my %h = timethese 1000, {
    'regex1' => &regex1, 'regex2' => &regex2, 'regex3' => &regex3,
    'regex4' => &regex4, 'regex5' => &regex5, 'regex6' => &regex6,
    'index1' => &index1,
};
say map { $_ => %h{$_}[3] }, sort keys %h;

sub regex1 { my @new = grep { /  $string  /           }, @lines }
sub regex2 { my @new = grep { /   xterm   /           }, @lines }
sub regex3 { my @new = grep { / "$string" /           }, @lines }
sub regex4 { my @new = grep { $r4                     }, @lines }
sub regex5 { my @new = grep { $r5                     }, @lines }
sub regex6 { my @new = grep { $r6                     }, @lines }
sub index1 { my @new = grep { .index($string).defined }, @lines }

# results
| index | regex1 | regex2 | regex3 | regex4 | regex5 | regex6 |
|-------+--------+--------+--------+--------+--------+--------|
| 0.003 |  0.435 |  0.028 |  0.068 |  0.485 |  0.032 |  0.069 |
| 0.008 |  0.446 |  0.029 |  0.070 |  0.517 |  0.033 |  0.072 |
| 0.003 |  0.437 |  0.028 |  0.068 |  0.474 |  0.031 |  0.074 |
| 0.003 |  0.449 |  0.028 |  0.070 |  0.593 |  0.034 |  0.070 |
| 0.003 |  0.472 |  0.030 |  0.078 |  0.508 |  0.031 |  0.087 |
|-------+--------+--------+--------+--------+--------+--------|
| 0.004 |  0.448 |  0.029 |  0.071 |  0.515 |  0.032 |  0.074 |

Refactoring Very Old Perl 5 in Perl 6

Back when I was first learning Perl, I'd been doing Unix system administration for a couple years, and one command I ran a lot was this one:

ps auxww | grep something

(On some systems it was 'ps -ef'.) That would get a full listing of all running processes and grep them for "something." I soon got tired of typing all that, so I made a shell alias:

alias pst='ps auxww | grep '

Then I could just run pst something, so it saved typing. But it still wasn't great. It left out ps's header line that showed what all the columns were, and they'd vary from one OS to another, so it wasn't always easy to tell from the data. Also, the grep process itself would show up in the list, which was annoying. (I already knew it was running, because I ran it.) So one of the first Perl scripts I wrote was this one, which I've been using ever since because it worked, even though the code is embarrassingly bad now:

#!/usr/local/bin/perl

open(IN,"ps axuww |")||die("Unable to get process listing\n");

$header = <IN>;
print "$header";

while(<IN>){
    next if ($_ !~ /$ARGV[0]/);
    s/^\s+|\s+$//g;
    @v=split(/\s+/);
    next if $v[1] == $$;
    print;
    print "\n";
    $trs+=$v[4];
    $drs+=$v[5];
    $size+=$v[6];
    $swap+=$v[7];
    $rss+=$v[8];
    $shrd+=$v[9];
    $lib+=$v[10];
    $dt+=$v[11];
}

print("-"x80);
print("\n$header");
printf("Totals:%22d%6d%6d%6d%6d%6d%6d%6d\n",$trs,$drs,$size,
       $swap,$rss,$shrd,$lib,$dt);

So much ugly by my standards now: no warnings or strict, unnecessary parentheses all over the place, 2-arg open with global filehandle, unnecessary $_, giving split its default argument, no my on variables, and more. Worst of all, I was accumulating totals on some of the columns, and that doesn't even make sense! Maybe it did on an OS I was using back then, but it doesn't on any I have now.

So it was long overdue for an update, and I thought I might as well do it in Perl 6. Here is the result, with numbered comments below:

#!/usr/bin/env perl6
use v6;

sub MAIN( $string ){                   # 1
    my $p = run 'ps', 'auxww', :out;   # 2
    my $header = $p.out.get;           # 3
    say $header, '-' x 80;             # 4

    for $p.out.lines {                 # 5
        next unless m/ $string /;      # 6
        .trim;                         # 7
        my @v = .words;                # 8
        next if @v[1] == $*PID;        # 9
        .say;                          # 10
    }
    say '-' x 80, $header;             # 11
}

I feel better just looking at it. The accumulation of totals is gone entirely, and here are notes on the rest:

(1) The MAIN sub handily replaces @ARGV (though I think that may still be available). By specifying an argument for it, I automatically get in $string what would have been in $ARGV[0], plus perl6 automatically throws an error if I don't supply it. So no need for a "die unless" on command line arguments anymore.

(2) This opens a Proc object ($p) to a running process provided by IO.run(). The :out adverb says I want the output of the process. By the way, look at the Proc docs for examples of this, not IO.

(3) Calling get() on the output stream (an IO::Handle object) returns a single line from the stream. That grabs the line of column headers that ps provides.

(4) Print out the header (which still has its newline), then a separator line of 80 hyphens. The 'x' operator still works on strings like it did in Perl 5, but see the 'xx' operator for repeating lists.

(5) Using IO::Handle::lines() in a for loop is more-or-less the equivalent of while(<$fd>) in Perl 5: it gets one line at a time until exhausted. The line will be in $_, and that will be the default object for any methods called as .method, like .trim and .say. Oh, and it auto-chomps, which is nice.

(6) This looks like Perl 5, except whitespace is allowed in the regex. Also, the string in $string will not be interpolated as it would be in Perl 5, so whatever argument I provide will be searched for literally. If I wanted to be able to enter patterns with meta-characters and have them interpolated, I'd need to put angle brackets around it, like this: <$string>. As I understand it, without the angle brackets, scalar values in regexes are automatically quotemeta'd.

(7) Str.trim() is the equivalent of that ugly regex in my original script, which trims off whitespace from both ends. I probably don't need that on my current systems, but I think I ran into some systems where ps didn't left-justify the first column.

(8) Str.split() no longer has a default pattern of splitting on whitespace, but the new Str.words() does that now.

(9) $*PID replaces the old $$, containing the process ID of the script itself. I don't want that in the output, so I skip the line that has that in the PID (process ID) column.

(10) If it reached this point, print the line with newline.

(11) Reprint the separator and header at the bottom, so I can see them there if the top scrolls out of the terminal.

And that's it! Corrections, suggestions, and questions welcome.

Minor Issue with Perl 6 Install on CentOS 6

I had a little hiccup while installing Perl 6 on a CentOS system, and thought I'd leave the details here in case it happens to anyone else.

[Update: This has already been fixed by one of the Perl 6 devs, who isn't able to login here to comment. Panda installs without needing lsb_release. So my kludge is no longer needed.]

I used rakudobrew, and installed rakudo with moar just fine. But "rakudobrew build-panda" failed with "Unable to execute 'lsb_release -a 2> /dev/null'". That lsb_release program wasn't installed on this system, but yum said I could get it from the package redhat-lsb-core. Unfortunately, when I tried to install that, it came up with a list of dozens of dependencies to go with it, including a lot of X stuff like ghostscript and libGL, even some sound packages.

This is a lightweight headless system, and it needs to stay that way, so I didn't want to install all that stuff just to get this one little utility. So I did a little research and found out what it would probably report, and created my own lsb_release that would provide what rakudo needed to see:

#!/bin/sh
echo Distribution ID: CentOS
echo "Description: CentOS release 6.6 (Final)"
echo Release: 6.6

I created that in my ~/bin directory (which is in my $PATH), chmod'd the permissions to 755, and ran the build-panda again. Problem solved!

So if you found this page because you're having the same problem, you might give that a try. Look in /etc/*-release to see what the Description should be, and pluck the Release out of that. Or if you're working with some other Linux distro that doesn't have lsb_release or those files, look around in /etc to see what you can find, or look through the output of dmesg. I don't know exactly what rakudo needs the info for, or how accurate it needs to be, but it needs to be in that format.

Using Unicode in Emacs for Perl 6

I use vi/vim for quick edits and remote work, but I do most of my programming (and other work) in emacs. To enter Unicode characters in emacs, you run the "insert-char" command, which by default is tied to "C-x 8 [Enter]", then type in the hex code for the character or its name.

Typing at least 5 characters to get one got old very fast, now that there are some Unicode characters that can be used as Perl 6 operators. So I wrote a lisp function which asks for a single character and looks it up in an alist (kinda similar to a Perl hash). That way I can enter any Unicode character I've put in the alist by hitting two keys: one key to run the function, then whatever key I assigned to that character. Here's the lisp, which I put in my .emacs file:

(defun ajb/common-unicode-chars (ch)
  "Function to map single keys to some common unicode characters.
Run the function and then press a key.  Non-mapped keys will default to themselves."
  (interactive "cChar: ")
  (insert
   (or (cdr
        (assoc-string
         (char-to-string ch)
         '(("[" . ?«)
           ("]" . ?»)
           ("a" . ?Α)
           ("d" . ?Δ)
           ("g" . ?Γ)
           ("o" . ?Ω)
           ("p" . ?π)
           (">" . ?⊃)
           ("<" . ?⊂)
           ("|" . ?∪)
           ("&" . ?∩)
           ("1" . ?⚀)
           ("6" . ?⚅)))) ch)))
(global-set-key (kbd "<f6>") 'ajb/common-unicode-chars)

The first two characters in the list are guillemets, which Perl 6 uses as the "hyperoperators" (more on those another time), and they can also be used as double-quotes for strings. I keyed them to the square brackets, since they don't require the shift key. Then I put in a few Greek letters I might want to use, a few of the Set operators, and a couple of dice faces, just to show some possibilities. As I discover more Unicode characters that I want to use more than once in a great while, I'll pick a key for them and add them to the list.

If you put this in your .emacs, it will be bound to the <f6> key in all modes. So to get the left-pointing guillemet, you'd hit "<f6> [". If you know lisp, the code is pretty simple. If you don't, I think it's fairly obvious how to add more entries, but please leave a comment if you have any questions.

Thanks to all the Unicode characters that are available, along with some other new features like "pointy blocks," you can write Perl 6 code that looks pretty strange to a Perl 5 guy, like this:

#!/usr/bin/env perl6
use v6;

my &Α = -> {
    Set.new('a'..'n');
}

my &Ω = -> {
    Set.new('m'..'z');
}

say [∩] Α, Ω;

I'll save explaining what that does for next time.

Simple Game in Perl 6

Here's a little game which was a sub-game in the 1980's Commodore 64 game Legend of Blacksilver. It's a simple gambling game where you are dealt five cards in a row from a standard 52-card deck. The cards are then turned up one at a time. After each card is turned up, you have to guess whether the next card will be higher or lower in rank than the last one. Aces are high, and you lose all ties. If you guess all four right, you win.

[Update: There's a serious bug in this version, as pointed out by Brad in the comments, but I'm leaving it in the original so the comments make sense, since I'm writing these for learning purposes. A fixed version can be found at my GitLab account.]

#!/usr/bin/env perl6
use v6;

my %v = ( 2 => 2, 3 => 3, 4 => 4, 5 => 5, 6 => 6, ### 1
          7 => 7, 8 => 8, 9 => 9, T => 10,
          J => 11, Q => 12, K => 13, A => 14 );
my @deck := %v.keys X~ <♠ ♡ ♢ ♣>;                ### 2
my $card = @deck.pick;                            ### 3
my $show = "$card ";

for ^4 {                                          ### 4
    say $show;
    my $l; repeat {                               ### 5
        $l = prompt 'Hi or Lo? ';
    } until $l ~~ m:i/ ( h | l ) /;
    my $new = @deck.pick;
    $show ~= "$new ";
    my $nv = %v{ $new.substr(0,1)};               ### 6
    my $cv = %v{$card.substr(0,1)};
    if $nv == $cv or
        ( $nv < $cv and $l  ~~ m/:i h/ ) or       ### 7
        ( $nv > $cv and $l !~~ m/:i h/ ) {
            say $show;
            say "Sorry, you lose!";
            exit;
        }
    $card = $new;
}
say $show;
say 'You win!';

I inserted commented numbers on lines with interesting features.

(1) I create a hash of card ranks to values, to make it easy to tell whether a card is higher or lower than another.

(2) There's a lot of Perl 6 goodness here. First, I bind the array with := instead of assigning it with =. I don't completely understand this yet, but I think that binds it like a Perl 5 reference, rather than copying the values. The point of that, I think, is that it means the compiler doesn't have to execute the right side until more values are needed in @deck. I'm not sure that actually gains me anything in this case, since all the values are probably needed in the next line, but it seems like a good habit.

The X is the "cross" operator. It takes two lists as arguments, and returns all possible combinations between them. Following it with ~, the Perl 6 string concatenation operator, tells it to concatenate each combination as a string. So that combines each key from the hash with each of the four card suit symbols, resulting in 52 unique combos.

%v.keys shows the way that functions can run as methods now with the dot (.) operator. I could say keys %v as I would have in Perl 5, but I think this makes things like precedence clearer.

The <> operator is no longer for reading lines from a file descriptor. Now it's shorthand for the qw// quoting operator, so it returns the four card suits (Unicode characters) as a list.

The game doesn't actually care about the suits, by the way, so I could have left them out, but I wanted to test Unicode in my editor and terminals.

(3) The .pick function randomly picks one item from the array, and will not repeat items on subsequent picks. No more need to calculate random numbers and splice elements out of a separate array!

(4) for loops have several new features, but this one doesn't use most of them. A couple things to note: you don't need parentheses around the arguments, and if you do use them, you must have whitespace between for and the opening parenthesis. I recommend not using them. Also, the ^n syntax means "integers from zero to n-1". It's a handy shortcut to say "loop this many times." If I needed the numbers 1-4 within the loop, I could still use the for 1..4 { method I'd use in Perl 5, and get the number in $_.

(5) The do/until loop is now repeat/until. Not much more to say about that, but check out the regex at the end of it. There are big changes in regexes in Perl 6, and that's one area where you can't do much until you learn some of them, because even fairly basic Perl 5 regexes probably won't work like you expect.

For one thing, whitespace is ignored by default, as if you used the /x modifier. For another thing, you don't put modifiers at the end anymore, and some of them have changed. Now the /i modifier goes at the beginning, as :i, and can go either inside or outside the delimiters. Other than that, the regex isn't too different: it just makes sure the response contains either h or l, case-insensitive.

One last thing there: there's now a built-in prompt routine that outputs a string and gets a response, so that replaces the old print 'question? '; my $answer = <STDIN>; combo.

(6) There's not anything drastically new here, but note again how substr() can be used postfix, where in Perl 5 it would have been substr($new,0,1).

One thing, though: you can't use whitespace in a few places where you could in Perl 5. One of those places is between objects and postfix functions and braces. So to line up those two lines, I had to put a space before $new; putting it after $new before or after the dot would have broken it. There's something called "unspace" you can use to get around that, but here it was simplest to put the space where I did.

(7) There's nothing really special here; it's just checking to see whether you selected 'h' as your guess, and checks that against the cards. One thing to notice, which I forgot to mention in #5: the matching operators have changed from =~ to ~~ and from != to !~~.

I think that's everything that's really new, so I'll stop there. There are a couple things about it that bug me, as if there's some refactoring that could be done, so I may come back to it later. If you have any questions or suggestions, please leave a comment.

Getting Started

I started programming in Perl in about 1995, a few years before design started on Perl 6. Over the years, I've taken a look at Perl 6 from time to time, but never got hooked. Sometimes it appeared too hard to get a working system running -- assuming you could at all. Other times the language looked so foreign that I wasn't sure what the point was: if it didn't even look like Perl, then it might as well be a different language, so why not go learn a different language that was ready?

Well, that finally changed this summer. The impetus was an interview with Larry Wall wherein, with his unique style, he talked about the language (and a variety of other things) in ways that intrigued me and made me want to see what he'd created. Hearing that a real release of Perl 6 was no more than several months away, and that a working compiler could be installed with a few commands, hooked me the rest of the way.

So I installed radukobrew, ran a few commands, and fired up a perl6 interpreter. Now what? Off to look for documentation and code examples. There's quite a bit of good stuff at perl6.org. It's sparse in areas, but it's plenty to get started with and try to absorb. I miss perldoc, though; reading docs at the command line is so much faster than clicking around for them on the web.

The strange thing about Perl 6, I'm finding, is not having any idea what's the best way to do things. After 20 years of programming Perl, I've gathered a pretty complete set of idioms and practices. If you need to lookup lines in a file, for instance, we know it's faster to load them into a hash than to loop through the file each time. Complex sorts can be sped up with the Schwartzian transform. Files should be opened with the 3-argument open() and followed with "or die()" error-checking. Lots and lots of little things that make you feel like your finished program is about as clean, reliable, and fast as you can make it.

With Perl 6, I don't have any of that yet. To compare lines in a file, should I read them into a hash like Perl 5, or put them in a Set and use the cool new Set operators? I have no idea. A lot more stuff is done under the hood, like error-checking, or can be left up to the compiler to work out the details -- but is that the best way? It's unsettling, not knowing, but kind of exciting at the same time, because there's a lot to explore here. And I get the feeling that, as we develop idioms to take advantage of some of these new operators, they're going to be very powerful.

For instance, say you want to find all the matching lines in two files, in no particular order. In Perl 5, you'd do something like this:

#!/usr/bin/env perl
use 5.010; use warnings; use strict;

my %filea = map { $_ => 1 } do { open my $fa, '<', 'filea' or die $!; <$fa> };
my %fileb = map { $_ => 1 } do { open my $fb, '<', 'fileb' or die $!; <$fb> };
for( keys %filea ){
    print if $fileb{$_};
}

After the standard shebang and use statements, the next two lines load the files into hashes as the keys. In addition to preparing a lookup hash from one file, that gets rid of duplicates in both. Then I loop through the keys of one hash and print them if they're found in the other. Now here's what I can do in Perl 6:

#!/usr/bin/env perl6
use v6;

my @a := "filea".IO.lines;
my @b := "fileb".IO.lines;
.say for keys( @a ∩ @b );

Much cleaner, isn't it? First of all, the use strict; use warnings; are gone, because they're the default in Perl 6. I put the use v6 line in because I think my editor needs it (more on the Emacs mode for Perl 6 in another post). The first two lines read each file into an array, since I don't need them to be in hashes. The file opening and error-checking are done under-the-hood, so I don't have to write them. Then the last line does a set intersection with that cool Unicode symbol (U+2229) which returns all lines that are found in both arrays.

It's much cleaner and shows more clearly what it's doing. I suspect anyone who's studied set theory could guess what this does, even if he's never programmed a day in his life. And there's some cool stuff going on here -- or that could be going on. Because of the way Perl 6 does "lazy lists," the underlying implementation can split jobs up into different tasks that run parallel, then come back to them when it needs their results. So in this case, the filling of @a and @b could be run at the same time. It may even be that the set comparison could start looking for matches before those finish, though that seems less likely. But the point is, if you have time-consuming operations that don't depend on each other, or if functionA() will be passing a list of items to functionB(), these operations may be able to run in parallel and speed things up, without you needing to do any threading or that kind of stuff. Very cool!

On the other hand, I don't know whether that cool stuff will happen in my compiler, or whether it's likely in the compilers and platforms today. So is it really the best way to do it? What's really happening under the hood in that Set comparison? Are my arrays being converted to Sets, and does that eat up time and memory? Could it be fast on one system and dog slow on another? Could it be slow today, but the fastest way at some future date when the compilers do more parallel processing? I don't know. There's a lot to learn here.

So as I learn, I'll share my discoveries, from the perspective of a long-time Perl 5 programmer who's gotten pretty set in his Perl 5 ways, but thinks some aspects of Perl 6 are just too cool to ignore. I haven't been this excited about a new language since, well, 1995. More to come soon.

About Aaron Baugher

user-pic I'm a programmer and Unix sysadmin who uses Perl as much as possible, operating from the Midwest USA. To hire me for sysadmin or programming work, contact me at aaron.baugher @ gmail.com or as 'abaugher' on #perl6.