Concurrency Weirdness

I'm still trying to get my head around how the concurrency stuff works. I had an idea for a simple script: sometimes I run a program that dumps a lot of output, so I'd like a script that would take that as input and just hold onto the most recent X lines of it, so that they could be read at any point through a named pipe.

So I figured I'd need one thread to read lines from STDIN ($*IN in Perl 6) and keep an array loaded with a certain number of the most recent ones, and another thread to open the fifo for writing and dump the current contents of the array to it whenever someone reads it. In Perl 5, I'd probably do it by forking off a child process for half of the work, or maybe use a single loop that does both things but has quite a few no-ops. I figure in Perl 6 I can do it with threads.

So I wrote the code below. As I understand it, Promise.start begins a separate thread running the code it contains, so that opens a write to the file. Because of the way named pipes work, that open completes when someone reads it, then I write the lines in @l to it and close it. Then I sleep a second, to keep it from seeing single reads of the file more than once, then open a write to prepare for the next reader.

So that's running in a separate thread, then the main code goes on to a loop that reads lines from standard input and loads the array, chopping lines from the beginning of the array when it gets longer than the max. It all works, except....it gets stuck. It gets to about line 35-40 or so from $*IN, and then stops. Then if I read from the fifo, the for loop runs through another 130 or so lines, then stops again.

By adding some debugging lines, I gradually figured out what's happening, if not why: as soon as the start/loop gets to the open function and stops there, the for loop stops as well -- but it always stops at the line that truncates the array. Then once I read a batch of lines from the fifo, the for loop continues and reads more lines until the start/loop gets back around to the open command, and then it stops.

I don't know why one thread waiting on a file subprocess would make another thread stop, and I especially don't know why the main thread would always stop in that same place. Nothing from that array truncation line uses or has anything to do with the open line that seems to stop it.

If anyone has any ideas what I'm doing wrong or missing here, please leave a comment. In the meantime, if I figure out what's wrong, I'll be sure to update here. And if there's just plain a better way to do this with Perl 6's concurrency tools, I'd love to hear that too.

## Command line arguments:
#    $lines - number of most recent lines to keep saved
#    $pipefile - named pipe to write them to when it is read
##
sub MAIN( $lines, $pipefile ){
    my @l;
    my $p = run << mkfifo $pipefile >>;
    die 'Unable to create fifo' if $p.exitcode;
    start {
        loop {
            my $out = open $pipefile, :w or die $!; # culprit
            say 'got a reader!';     # debug
            $out.say: @l.join("\n");
            $out.close;
            sleep 1;      # prevent double-reads
        }
    }
    for $*IN.lines {
        @l.push: $_;
        # stops here when 'culprit' line is reached
        @l = @l[@l-$lines .. *] if @l > $lines;
        $_.say;  # debug
    }
}

My First Perl 6 Module: Net::SSH

I've written my first Perl 6 module! I wanted to start contributing modules, but it took a while to find one that wasn't already done, but was simple enough to tackle as my first one. I also wanted to make a proper distribution for it, which was something I hadn't done before even in Perl 5, so it took a while to get everything in order. I'm sure there are still mistakes, but I'm still working on it, and it should go much better on future modules.

The distribution is at my GitLab repository. So far, it's just a simple wrapper around the 'ssh' command-line tool, which runs a command on a remote server and returns the output as an array of lines. I borrowed some of the code from the Perl 5 module Net::SSH, and Perl 6-ified it. It's not ready for panda or anything, but I hope to get it there once I make it more capable and add more error-checking and testing. A few observations:

I don't completely understand the module/class/package distinctions yet. I ended up with a class because I needed a few attributes. I guess a module is just a package of functions. And a package....not sure. I also had to make it a 'unit class', and I'm not sure why. I started out with 'my class Net::SSH {', but that wouldn't work at all, I think because it made the class private to the file. I got that idea from looking at some of the core modules like IO::Handle, and it seems to work there, so I'm not sure why. Some research to do on those things.

I put all my POD at the end of the file, because the '=begin pod' lines seem to confuse cperl-mode's syntax highlighting in emacs. Not surprising, since I think it's a fairly new change. I should go put an issue in for the maintainer of cperl-mode when I'm done writing this. Anyway, I originally had the POD sprinkled through the file next to the class and method definitions, but I don't know if I like that anyway -- more scrolling to get to different pieces of code. My preference would probably be to put the POD in a different file altogether, but I want to follow whatever standard there is.

As always, suggestions, corrections, and laughter are welcome.

The Old Becomes New Again: A Gopher Server in Perl 6

The web is slow. Between overloaded pages, overloaded servers, bloated browsers, and ISP throttling, it's not unusual for even simple pages to take a couple seconds to load. But you get used to it, so I was struck by how fast the Internet can actually be when I fired up a gopher client recently.

Yes, gopher still exists, barely. For those who are unfamiliar with it, gopher is a sort of text-based hypertext system that predates the web -- think the web minus graphics, fonts, and interactive scripting, just files and directories and links between them, though you can also offer a few other things like search services. I miss those specialized services like gopher, IRC, and Usenet, that did one thing very well before they were overwhelmed by the web.

Anyway, when I started it, it connected to the server so fast that I thought it had to be faking. Did it install a local dummy server or something, and I was actually connecting to that? Some clicking around showed that, no, I was really connecting to remote servers; it was just so fast it was virtually instantaneous.

That got me thinking I'd like to run a gopher server, just for fun and nostalgia. But I was surprised to find no gopherd in the FreeBSD ports. Net::Gopher has even been dropped from CPAN. There's code out there, but it tends to be old and is considered buggy. I found one recent Python version, but....Python. So I thought I'd write my own, and why not do it in Perl 6?

A very simple, early implementation is below. It handles only the main two gopher types: documents and directories. You give it a directory, and it offers everything under that directory as an information tree.

I continue to love the new IO functions like slurp and dir. They make dealing with the filesystem faster and more intuitive. Actual writing time on this was less than an hour. The filesystem maps to the gopher protocol so well that it almost seems too easy. (I spent at least 3 hours trying to figure out why the gopher client in the FreeBSD ports (net/gopher) didn't like its output, before discovering that lynx likes it fine, so I'm still not sure what was going on there.)

A couple things I added along the way: If a directory contains a .gophermap file, that is sent in place of the directory info, so it will need to be in valid gophermap format. If a directory contains a .gopherhead file, that is sent ahead of the directory listing, so it's a way to put an intro at the top.

Future additions to make it worth distributing:

  • Multi-threading to handle multiple connections
  • Make path handling cross-platform
  • Allow files to be hidden/blocked, maybe .gitignore style
  • Security -- make sure it's not possible to get outside the directory
  • Documentation
  • Installer/Tests
  • Error handling
  • Modularize it. Net::Gopher?
  • Logging by some standard or other
  • Binary file downloads
  • Additional gopher types

And now the code:

#!/usr/bin/env perl6
use v6;

my $TITLE = 'My Gopher Server';
my $CRLF = "\r\n";
my $SERVER;
my $PORT;
my $DIR;

sub MAIN($host = 'localhost', $port = 7070, $dir = '.'){
    $SERVER = $host;
    $PORT = $port;
    $DIR = $dir;
    $DIR ~~ s/ \/$ //;              # remove any trailing slash
    die "Directory '$DIR' not found" unless $DIR.IO.d;

    my $listen = IO::Socket::INET.new(:listen,
                                      :localhost($host),
                                      :localport($port));
    say "Listening...";
    loop {
        my $connection = $listen.accept;
        say "Answering...";
        my $line = $connection.recv;
        handle-request($connection, $line);
        $connection.close;
        say "Closed...";
    }
}

sub handle-request( $c, $l is copy ){
    $l ~~ s/ <[\t\r\n]> .* //;             # input line ends at first tab or newline
    say "Fetching '$l' for 'remote-addr'"; # looks like the remote address is not available yet
    $l = "$DIR/$l";
    $l ~~ s/ \/$ //;                       # remove any trailing slash
    if $l.IO.d {                           # handle directory
        if "$l/.gophermap".IO.e {
            $c.print( "$l/.gophermap".IO.slurp );
        } else {
            if "$l/.gopherhead".IO.e {
                $c.print( "$l/.gopherhead".IO.slurp );
            }
            for dir $l -> $n {
                next if $n ~~ m/^\./;
                if $n.IO.d {
                    $c.print: 1;
                } elsif $n.IO.f {
                    $c.print: 0;
                } # skip symlinks, pipes, and other oddities
                $c.print: join "\t", "$n", "$n", $SERVER, "$PORT$CRLF";
            }
        }
    } elsif $l.IO.f {                    # handle file
        $c.print: $l.IO.slurp;
    } else {                             # send error
        $c.print: "3Error which should be expanded on.";
    }
    $c.print: ".$CRLF";
}

Concurrency and Channels in Perl 6

I've been working on a Bayesian spam filter, but it keeps running out of memory, so I moved to something else for a while. The new concurrency stuff looks really interesting, but I don't understand it well yet. As a project, I came up with the idea of a password cracker, which would check a crypt-style hash against a word list. (This probably isn't a CPU-intensive enough task to be worth threading, but it was simple.) Here's the code, with details below:

#!/usr/bin/env perl6
use v6;
use Crypt::Libcrypt;

sub MAIN( $encrypted, $wordfile, $units=5 ){
    my $salt = $encrypted.substr: 0, 2;
    my $stream = Channel.new;
    start {                                      # 1
        for $wordfile.IO.slurp.words -> $w {
            $stream.send($w);
        }
        $stream.close;
    };

    my $match;
    await do for ^$units -> $u {                 # 2
        say "Starting unit $u";
        start {
            loop {
                if $stream.closed.not and        # 3
                   $stream.receive -> $w {       
                    say "Trying $w in unit $u";
                    if crypt( $w, $salt ) eq $encrypted {
                        $match = $w;
                        $stream.close;
                    }
                    sleep 1;
                } else {
                    last;
                }
            }
        };
    };
    say "Found it!: $match" if $match;
}

I call it with: ./crackpass.p6 abPRdpgdfUoM. wordlist 5

That encrypted string will match the word "george" which is in the "wordlist" file.

1) Starts a thread which opens the word file and starts sending the words to the Channel $stream.

2) Starts 5 (or however many specified on the command line) threads. Each one loops, getting the next word in $stream and checking to see if it matches the encrypted string.

3) Checks to see if $stream is closed, then gets a word from it if it's not. There's a race condition here, because if I remove the sleep line, sometimes I get "Cannot receive a message on a closed channel". I think what's happening is one thread gets to the first half of the if statement, sees that the $stream isn't closed, then goes to the second half, but in the meantime another thread takes the last word from $stream and lets it be closed, so the first thread's $stream.receive errors. There must be a way to handle that, but I haven't figured it out yet. I know I could wrap it in a try/catch, but that seems like a kludge. Maybe I should be using earliest or something like that.

Except for that race condition, which doesn't always happen, it does work. I have no idea how much actual concurrency is going on on my system (FreeBSD on dual-core amd64), but it's fun anyway. I'm trying to think of something more interesting to do with these tools, that would really show off what they can do -- once I understand it myself.

A Video-to-Song Converter in Perl 6

Here's my longest Perl 6 script yet. I'm trying to come up with some shorter ideas for future articles.

I have a directory full of music videos in MP4 format, and wanted to convert the audio to MP3 files. I also wanted to insert the title and artist, which are generally found in the filenames, into the ID3 fields in the MP3 file. However, there was a sticking point: the filenames are inconsistent. The only pattern is that they all end in 12 junk characters and a ".mp4" extension. Other than that, some are "title - artist", some are "artist - title", some are one of those without the hyphen or anything to separate the two, some have other bits of text stuck in here and there, a few don't have the artist at all, and so on.

So I knew I couldn't fully automate it. But I thought there had to be a better solution than renaming them all by hand and typing in all the correct data. What if I wrote a script that did a best guess of what the fields should be, showed them to me, let me edit them, and then used that to create the files? That led to the script below. Notes after.

#!/usr/bin/env perl6
use v6;

sub MAIN() {                                               #1
    my %s;
  SONG:
    for dir.grep: / . ** 12 \.mp4 $/ -> $l {               #2
        my $n = $l.substr(0,*-16);                         #3
        my ($a, $t) = $n.split( rx| \s* \- \s* |, 2 );
        unless $t and $a {
            ($a, $t) = $n.split( rx| \s+ |, 2);            #4
            $t = $a unless $t;
        }
        my %h = ( title   => $t,
                  artist  => $a,
                  album   => 'Downloads',
                  genre   => 'Rock',
                  comment => "Converted by $*PROGRAM-NAME",  #5
              );
        loop {
            %h<newfile> = "%h<artist> - %h<title>.mp3";      #6
            print-song($l, %h);
            my $p = prompt('Type a letter and new value: ');
            my ($c,$r) = $p.split( rx| \s+ |, 2);
            given lc $c {                                    #7
                when 'a' { %h<artist>       = $r }
                when 't' { %h<title>        = $r }
                when 'l' { %h<album>        = $r }
                when 'g' { %h<genre>        = $r }
                when 'c' { %h<comment>      = $r }
                when 'w' { %h<title artist> = %h<artist title> }
                when 'h' { show-help }
                when 'n' { next SONG }
                when 'x' { exit }
                when 's' { last }
                when 'p' {
                    process-saved-songs(%s);
                    next;
                }
            }
        }
        %s{$l} = %h;
    }
    process-saved-songs(%s);
}

my &process-saved-songs = sub (%s is rw){                      #8
    for %s.kv -> $k, %v {                                      #9
        my $p1 = run(< ffmpeg -i >, $k, < -ab 96k >, "new/%v<newfile>", :out );  #10
        say $p1.out.slurp-rest;
        my $p2 = run('id3v2',
                     '-a', %v<artist>,
                     '-t', %v<title>,
                     '-A', %v<album>,
                     '-g', %v<genre>,
                     '-c', %v<comment>,
                     "new/%v<newfile>", :out);
        say $p2.out.slurp-rest;
        rename( $k, "./done/$k" ) or die $!;             #11
        %s{$k}:delete;
    }
}

my &print-song = sub ($l, %h) {
    say qq:to/END/;                                      #12

    Old Filename: $l
    New Filename: %h<newfile>

    Artist:   %h<artist>
    Title:    %h<title>
    aLbum:    %h<album>
    Genre:    %h<genre>
    Comment:  %h<comment>

    w: sWap Title and Artist      s: Save song info         h: Help
    n: Next song without saving   p: Process saved songs    x: Exit      
    END
}

my &show-help = sub (){
    say q:to/HELP/;
===========================================================
This program tries to determine what the title and artist of a song should be
from its filename, lets you edit those and other ID3 fields, saves a list of
files to process, then processes them by converting them to MP3 format and
setting their ID3 fields.

Type one letter, then if it needs a value, type that after a space
and hit enter.

Examples:

t Dirty Deeds           -> Changes the song title to "Dirty Deeds"
a AC-DC                 -> Changes the artist to "AC-DC"
l Back in Black         -> Changes the album to "Back in Black"
g Rock                  -> Changes the genre to Rock
c Favorite song         -> Sets the comment field to "Favorite song"
w                       -> Swaps the artist and title fields        
h                       -> Displays this help
s                       -> Saves the song info for processing
p                       -> Processes all saved files
n                       -> Skips to the next song, ignoring this one
x                       -> Exit
===========================================================

HELP
    prompt("Hit enter to return to songs: ");
}

1) I don't really need a MAIN sub, since I don't have any command-line variables. But it's always possible that I'll add some later, so this way I'm prepared. Also, it lets me put %s inside its scope, rather than having it exist in the entire file.

2) There's some interesting stuff going on here. First, dir() replaces all the opendir/readdir stuff we used to have to do. Then you can see how method calls like grep can use the "grep: argument" notation. I kinda like that; I may have to use that more often. The regex here is interesting because it uses .**12 where it would have been .{12} before. By the way, a big thanks to whoever put all the hints in the perl6 compiler that say, "It looks like you're trying to do [insert perl5 thing]; maybe you should try this." Those are a big time-saver.

3) substr() has changed in some ways. Now instead of using a negative number to count from the end of the string, you use *-n to say "end of string minus n". It makes sense. I did notice one problem: if the string isn't that long, you get an error, because then it ends up with a negative number after all, which it doesn't like. So I guess if there's a possibility of a shorter string, you have to adjust for that.

4) I expected to use the new words() method here, but couldn't, because it turns out words() and split() work differently when you give them a limit. split() stuffs all the remaining text in the last element, while words() drops whatever doesn't fit. For instance:

"The quick brown fox".split( rx| \s+ |, 2 ); #= ('The', 'quick brown fox')   
"The quick brown fox".words(2);              #= ('The', 'quick')

So split() it is this time, unless there's an option to words() that I didn't see.

5) Most of the old short-named special variables are gone. The old $0 has become $*PROGRAM-NAME. That's probably not a bad thing; I often had to look up the ones I didn't use regularly anyway. Now $0 is the first captured match from a regex.

6) I'm still getting used to the new look of hash variables. I still catch myself putting $ in front of them, but I'm getting better. The quoting when you don't want interpolation is nice and clear.

7) given/when makes for a nice, clear table of actions here.

8) In Perl 5, if you passed a hash (or array) to a sub, it got flattened out into a list and copies of the values passed, so if you wanted the actual thing passed, you passed a reference. Perl 6 pretty much passes a reference by default, but it also makes it read-only. So to let my sub clean out the hash after doing its work on it, I needed to specify it as rw (read-write).

9) I'm noticing that Perl 6 is even better than 5 at just doing what you mean. Here I'm pulling key and value pairs from a hash, where the values are hash references. In 5, I would have needed to deference those, so I probably would have had something like $s->{$k}{newfile} all over the place. Here I just say, "give me that as a hash," and away I go. I suppose eventually I'll run into a situation where I wish it were more literal and less helpful, but I think that will be rare.

10) I'm not thrilled with the ugliness of this list of arguments. Guess that's why I went a different route two lines later.

11) I tried to use move() first, but it's not implemented yet, so rename() worked.

12) This is the new heredoc syntax. It has some nice features, one being that it's smart about indentation. Here, since END is indented four spaces, it'll strip four spaces from the beginning of the other lines as well. The qq:to/END/; syntax gives cperl-mode in emacs fits, though; I'll have to submit an issue on that.

For details on how to actually use it, run it and hit 'h' for help. Basically, it takes a guess at the file's title and artist, and displays them for you along with some other fields that you can edit. One idea I had was to provide a sWap option that swaps the title and artist fields, since about 1/3 of the files had them backwards from the rest. Once I like a file's info, I hit 's' to save it. It doesn't go ahead and process the file right away, because that might take a while and be annoying. So it saves the info in %s, and doesn't process the files until you hit 'p' to process all saved ones, or when it runs out of files.

It's not really intended to be portable or anything yet, and it assumes a lot of things: that your files have a certain format and extension, that certain directories exist, that you have ffmpeg and id3v2 installed, etc. But feel free to use it, expand on it, laugh at it, curse it, whatever you like.