May 2010 Archives

Almost Reinventing XPath for YAML

I think I started to reinvent XPath. I hate when that happens.

It started simply enough. I had a directory with 12,000 YAML files in it. I wanted to grep them, but I can't do that from the command line:

 -bash: /usr/bin/grep: Argument list too long

Although slightly annoying this isn't a big deal. I can write a Perl script to do the job and get the files through opendir. I hardcoded the bits that I wanted to get out, and when I needed something else, I just changed the source.

I made a couple of these quick scripts before I realized I was going to be doing this a lot. I refactored it so I could specify on the command line the path to the YAML thingy I wanted, and called it, well, it was called extract, but I renamed it ypath to put it on Github.

 % ypath dist_info/dist_file *.yml

I could do it for multiple values:

 % ypath dist_info/dist_file,dist_info/module_list *.yml

This was all fine for awhile. But then I had a case where I needed to extract an array element, so I needed to handle array indices:

 % ypath dist_info/dist_file,dist_info/module_list/2/md5 *.yml

Then, I wanted to handle all the elements of an array if I ran into one, so I wanted to through an @ globby sort of thing:

 % ypath dist_info/dist_file,dist_info/module_list/@/md5 *.yml

Or maybe all of the keys of a hash:

 % ypath dist_info/build_file/% *.yml

But, at that point I realized what was happening and didn't go on. I didn't really need those bits even though I thought they would be cool. Unless I really, really, really need it to do something else, that's where I'm stopping with it.

Not only that, surely someone else must have already made a much better tool to do the same thing, even if I can't find it

Dan Meyer on math, Alan Kay on physics, and the Pharo tutorial

Dan Meyer has a nice TED talk about math pedagogy:

I find it very similar to Alan Kay's TED talk where he shows off various Smalltalk things to illustrate physics concepts:

The best introductory programming book I have ever seen (and I've seen a lot in many languages) is Squeak: Learn Programming with Robots, which many people might recognize as a very LOGO like introduction.

Along with that, I was quite intrigued with Pharo's introduction to Smalltalk. This world has a tutorial class. When you first open the world (which is just double-clicking the icon of the single file you downloaded), the big window tells you exactly what to do first then leads you through a 24 step tutorial of the entire language and some of the tools:

If you haven't run into Smalltalk before, its world is what many other languages strive to be but ignore. Smalltalk is where the idea of a refactoring browser started, and what a lot of people want for Perl. I think a lot of the work going into Padre is really an attempt to force a Smalltalk-like world on Perl. I even tried to make my own Perl system browser, which I've now put it on Github although I don't use it anymore.

Extracting and accessing the release history in perlhist

I needed to make some charts showing the sequence of perl's releases, so I wrote some code to extract all that from perlhist. I thought I might turn it into a module or add it to Module::CoreList, but I have a lot of other things I need to do. If someone else wants to do that, however, here's my script:

#!perl

use 5.010;

use File::Basename;
use File::Which qw(which);
use File::Spec::Functions;

my %Months = qw(
    Jan 1
    Feb 2
    Mar 3
    Apr 4
    May 5
    Jun 6
    Jul 7
    Aug 8
    Sep 9
    Oct 10
    Nov 12
    Dec 12
    );

my $perldoc = find_perldoc();

open my( $doc ), "$perldoc -m perlhist |";
1 until( <$doc> =~ /=head1 THE RECORDS/ );
1 until( <$doc> =~ /==================/ ); 

my $previous_maintainer;
until( (my $line = <$doc>) =~ /=head2 SELECTED RELEASE/ )
    {
    next if /^\s+$/;
    my( $maintainer, $version, $date ) = unpack( 'A9 A14 A12', $line );
    next unless $version;

    $maintainer =~ s/^\s+|\s+$//;

    $maintainer ||= $previous_maintainer;
    $previous_maintainer = $maintainer;

    $version =~ s/^\s*|\s*$//g;
    $date =~ s/^\s*|\s*$//g;

    my( $year, $month, $day ) = split /-/, $date;
    next if $month eq '???' || $day eq '??';
    warn "Month is undefined: $line\n" unless defined $month;

    say join "\t", $maintainer, $version, 
        sprintf( "%4d%02d%02d", $year, $month, $day); 

    }

sub find_perldoc
    {
    # need to find the right perldoc for the perl we are running
    my $dirname    = dirname $^X; $dirname = '' if $dirname = '.';
    my $basename   = basename $^X;
    my( $suffix  ) = $basename =~ m/\Aperl(.*)/;

    my $perldoc = catfile( $dirname ? $dirname : (), 'perldoc' . $suffix );
    print "perldoc is $perldoc\n";

    print "x: $^X d: $dirname b: $basename s: $suffix\n";

    my $path = which( $perldoc );
    print "$path\n";

    die "$path is not executable\n" unless -x $path;

    return $path;
    }

I want Perl Testing Best Practices

[I actually wrote this a long time ago and it's been stuck in the draft status. I don't have answers for these yet.]

I've been swamped with work lately, and despite perl5-porters giving me and everyone else plenty of time to update all of our modules for the next major release, I basically ignored Perl 5.11. Life sucks sometimes, then they release anyway. This isn't really a big deal because all the CPAN Testers FAILs go to a folder that I look at all at once. It's a big deal for other people when they try to install a borken dependency and cpan(1) blows up.

However, my negligence in updating my CPAN modules reminded me of a possible best practice that has been on my mind for a long time, and which I've casually brought up at a couple Perl QA workshops since I've written several Test modules. Don't rush to say Test::Class just yet.

In a nutshell, Perl's testing framework is a stew of real tests and implied tests, and we can't tell the difference. Some of those tests use Test::Builder-based functions that generate test output:

 ok( $some_value, 'Some value is true' );
 like( $var. $regex. 'Hey, it matches!' );

Some things that we don't normally think of "tests" actually are:

 use Test::File;

By using my Test::File module, you are asserting that it passes all of its tests too. If you don't have it installed, cpan(1) will, by default, try to fetch that distribution and run its tests (cpanminus decidedly won't).

The problem with a Test module is that its tests working or failing have nothing to do with the code that you are trying to test, but a failure to load my module, which is probably completely my fault, but in the mess of output from a test failure, I usually don't get the blame.

So far, we don't have a strong practice for capturing problems in tests. In fact, despite the Perl community's otherwise good practices and coding standards, we don't pay attention to test script quality. In particular, we let our test scripts fail for all sorts of reasons that have nothing to do with the target code. Maybe I need to do something like this instead:

 eval {
      use Test::File;
      } or skip_all( ... );

Okay, that's one thing that bothers me about my tests. I also frequently micro-manage tests. Let's say that I want to test a method that needs to open a file. I'll have several checks in the setup because I want to ensure that I'm doing the right thing before I get to testing my method. That is, I want to test my test setup:

ok( -e $filename, "$filename is there" );
is( md5( $filename ), $expected_md5, "$filename looks like it has the right stuff" );
is( -s $filename, $expected_size, "$filename has the right size" );

Test::Class (and maybe some other frameworks) have setup and tear down methods that mitigate this, but that's not really my problem. If one of these setup methods fail, it's a failure of my test suite but not necessarily my module. I'd like to report that differently.

I've thought that TAP's binary ok / not ok was a bit limited. I'd actually like to have ok / not ok / unable to test / unknown. "Unable to test" is different than "skip". Consider architecture dependent tests that won't run—those are skip tests. If Test::Pod is not installed however, I'd like to see a report that explicitly says "unable to test". It's the undef of the testing world. I'm not actually proposing a change to TAP. Maybe there's some other practice can do the same thing. TAP isn't the point, really, I don't think. As a community, we just don't haven't paid that much attention to what each call to a Test::Builder-y function is really testing and in which column we should put the result. I've been thinking that maybe I should only call a Test::Builder-y thing when I want to report a result that directly relates to the code that I am testing.

Finally, I don't have a habit of documenting my tests. Sure, I put in code comments and the like, but I'm talking about full-on embedded pod that ties together some notional spec with what the particular test file is going to do with it. I feel guilty for about five seconds before I move on to something else.

A lot of people think about the underpinnings of our test system, but we've spent very little time at Perl QA workshop thinking about what a programmer should type out as they write a test file. To solve this, I think the first step is to probably just collect a bunch of stories from people about the practices they use and what nags at them at this level.

Sebastian Wernicke finds the formula for good TED talks.

Sebastian Wernicke gives a wonderful TED talk on constructing TED talks. He looked at all available TED talks with their popularity, and he was able to extract certain elements that made each talk either popular or unpopular. These factors might be the particular words used, the color of the slides, or what the speaker wore.

="tr…

Talking Django smack at the Billy Goat.

Some of the Chicago Perl Mongers got together for drinks last week. We started at one place, decided it was too crowded, went to another, and finally ended up at the Billy Goat Tavern, along the way talking about all sorts of different things, some of them Perl.

That's the best sort of Perl mongers meeting, and was always my intent with the idea. Put a bunch of smart people together in an unstructured environment and let the conversation go where it will, often in unexpected directions. Although some people might call this "networking", I prefer the no-label version. It's not something that you get out of a group of people watching someone go through a slide deck.

I had just watched Cal Henderson's "Why I hate Django" talk, and it makes for a lot of interesting conversation. The hook is the normal geek bonding over hating something (anything), but despite the comedy Cal has a lot of interesting things to say about web application architecture. Indeed, most web frameworks have these same problems (including all the Perl ones I think).

The talk mostly speaks for itself.

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).