January 2010 Archives

Now's the chance for North Americans to easily get to a Nordic Perl Workshop

The bridge connecting two continents Iceland is literally where North America and Europe meet, and that's where the 2010 Nordic Perl Workshop will be: it's going to be May 1-2 in Reykjavík. I'd like to get many more North Americans to show up to NPW this year, so start thinking about how you can participate in this special NPW that is likely to be the only one ever held in Iceland.

I started this idea in a use.perl post about holding a YAPC in Iceland. After YAPC::EU 2008 in Copenhagen, Josh McAdams, Adam Kennedy, and I hopped over to Iceland to meet with Ævar Arnfjörð Bjarmason and Tryggvi Björgvinsson to convince them to try a Nordic Perl Workshop in Iceland. I'd like to complete the set of all of the Nordic countries, and Iceland and Finland are still missing (and I don't think we are going to count the Faroes). Tryggvi is already involved with FSFÍ, the local free software advocacy group. Since then, the usual NPW organizers have backed the idea and it looks like it is going to happen. They're putting together the details now and a website should be available soon.

The glacial run-off at Gullfoss, Iceland


Reasons for North Americans to go to NPW 2010 in Iceland

  • Meet a segment of the Perl community you probably won't meet in North America.
  • Attend your first NPW.
  • Air Iceland allows "stopover" trips without all the extra fees or charges. Fly out of the US, stop in Iceland for several days, then continue on to major points in Europe. Plan that vacation to visit London.pm but stop in Iceland for a conference.
  • Iceland really isn't that far away. It's five hours from Boston and six from New York. That's the life of a modern laptop battery.
  • The US dollar is very strong compared to the Icelandic Kroner. It's an expensive country (as are all of the Nordic countries), but you're not going to get it any cheaper than now. In 2008 I got about 75 ISK to 1 USD, and now it's 130 ISK to 1 USD.
  • There's a special time suck for Schwern and Adam Kennedy, and it's Iceland's fault. Find out what that is.
  • Iceland is the most naturally beautiful place I've ever been.
  • People at parties seem to be impressed that I've gone there. It's not a place that a lot of people go although it seems a lot of people want to go.
  • You can go to Geysir to see the original. This picture doesn't do it justice, but the geysir is about to erupt as the surface of the pool starts to blister under the pressure:
The original geysir, at Geysir, Iceland


Making Module::Starter easier to subclass

I started with what I thought would be a simple Module::Starter subclass. I wanted to modify Module::Starter::Smart, which knows how to add a new module to an existing distribution, to properly handle an existing MANIFEST and MANIFEST.SKIP. I thought I could just override create_MANIFEST (RT 53339). Once I overrode that, however, I wanted to make a couple of additional tweaks, things weren't so easy.

Although I'm using Module::Starter as an example in this post, I could just as easily use one of my own modules to illustrate the same thing. I've just been hacking on Module::Starter for the past week so I can get it into shape for use in the next edition of Intermediate Perl. I'm not a particular fan of the module, but its the best place to start if you don't know you want to use something else.

My problem started when I also wanted to handle some DWIMery in the --dist command-line parameter:

% module-starter --module=Pig --dist=DISTNAME

Since I wanted to add a module to an existing distribution, I wanted to handle the common case of being in the the distribution directory as I work. I shouldn't have to specify --dist because I can figure it out by looking at META.yml. If I don't specify it, the current module-starter creates a new distribution inside my distribution. I tried to get around that by using ., which is just stupid:

% module-starter --module=Pig --dist=.

That doesn't work, and it shouldn't. The --dist is for the distribution name, not its location. The value in --dist shows up as an interpolated string in the module source for things such as the RT Queue addresses. That's fine. I don't really want to have to type it anyway, so I started looking how I could hook into the configuration to fix it up before the magic happens.

However, the current setup (1.54) for Module::Starter is not flexible enough for this because it does a couple of things that are typically no-no's in object-oriented design. As I said before, that's not a big deal. It's open source and I can submit patches, monkeypatch, or whatever I like.

The first problem shows up in the implementation of Module::Starter::App, which is the code behind module-starter. The first step in the process is a call to a fixed class:

# module-starter
Module::Starter::App->run;

That's a tough nut because you have to start somewhere. module-starter could do some gymnastics to dynamically rebless it into a user-defined class, but it doesn't need to. I have a better way that you'll see by the end of this post. The real problem is that run does the same thing again:

# Module/Starter/App.pm
# lots of munging of %config
$config{class}->create_distro( %config );

That $config{class} is a class name, Module::Simple::Starter by default, or whatever the last class in your plugin chain is (a different mess that I'll ignore for this post).

Now, the next problem is create_distro itself. It's a class method, and once it runs, it constructs an object then works with that object:

sub create_distro {
    my $class = shift;

    my $self = $class->new( @_ );

    # a lot more code
}

That's the only entry point that Module::Simple::App gives me. It's not that much of a problem, but it is messy. I have three options now:

  1. extend create_distro like most subclasses would do, and call ->SUPER::create_distro

  2. extend new, but hope that create_distro handles whatever I change

  3. override create_distro completely

None of these are attractive in this case, and I tried all three before I decided they were all wrong and require too much work:

  1. Extending methods is fragile in Module::Starter because you have to live with a subclass chain that you don't control and might work at cross-purposes. The create_distro method does way too much work anyway, so proper sequencing of events and changes is difficult.

  2. Overriding create_distro means I have to duplicate a lot of code even if I only want a small change.

The design issue with almost any subclass revolves around three issues: how much can I control, when can I get control, and how much work do I have to do?

Well, I can control everything, but in the useless sense of control. In my subclass I can override create_distro and never call any superclasses. That's not very useful because it fails on the economy of work I have to do. I have the same amount of control as if I start from scratch. That's not interesting control. I want organic control that comes from what is already there and requires the least change possible to meet my need. I'm content with most of what Module::Starter does, so I want to reuse as much of it as possible. I don't have much control there.

The next problem is "When?". I have to wait until Module::Simple::App calls create_distro. By the method name alone, that's too late for what I want to do. I want to do some work before I create the distribution. I can override create_distro to provide a suitable entry point, but it's the only method that module-starter will call, so I have to turn create_distro into the method that does everything as if the process was a single step. It's misnamed as such and actually useless since the extra layer of indirection adds no benefit.

The process isn't a single step. There are five major steps in Module::Starter that a resonable subclass might want to control:

  1. Read the configuration file to get the plugin chain

  2. Read the command-line arguments

  3. Adjust the configuration

  4. Cook the templates and create the "hard" distribution files

  5. Generate the "soft" files (the auto-generatable ones)

I want to hook into that Step 1 so I can use the configuration file, but create_distro starts me at step 4 (that's not entirely true because create_distro conflates Steps 3, 4, and 5, but Step 3 is also partially handled by Module::Starter::App).

Once I gave up on Module::Starter's current implementation, things got much easier. I changed Module::Starter::App to give me more control, but also with (I think) preserving the same high-level behavior so old plugins still work. I make it call new then a series of new hooks. create_distro is now an instance method and I can handle setup tasks before I get to it. This is the meat of the patch I submitted as RT 53539:

# my new Module/Starter/App.pm
sub run {
    my $class = shift;

    my %config = $class->_config_read;

    %config = $class->_process_command_line( %config );

    eval "require $config{class};";
    print "Class is $config{class}\n";
    croak "Could not load starter class $config{class}: $@" if $@;

    $config{class}->import( @{$config{plugins}} );

    my $starter = $config{class}->new( %config );
    $starter->postprocess_config;
    $starter->pre_create_distro;
    $starter->create_distro;
    $starter->post_create_distro;
    $starter->pre_exit;

    return 1;
}

A lot of the run implementation disappeared into either _config_read or _process_command_line. I should probably refactor the subclass discovery into its own method too. This is close to what any run subroutine should look like: almost every decision is made in some other method. I don't have to change run to customize it. More on that in a moment.

I made a corresponding small change in Module::Starter::Simple to handle its change from a class to instance method, although still work as a class method. If I call it with the old, class method interface, I create an object, just as before. If I call it with the new instance method interface, it just skips the call to new:

sub create_distro {
    my $either = shift;

    $either = $either->new( @_ ) unless( ref $either );
    my $self = $either;

    ...
}

I should probably add some sort of deprecation warning in there too.

I added several more method calls, but in Module::Starter::Simple they are just stubs available to subclass that want to use them:

sub postprocess_config { 1 }
sub pre_create_distro  { 1 }
sub post_create_distro { 1 }
sub pre_exit           { 1 }

I can go one step farther though. Now that I have a process that has distinct steps, I can decompose create_distro into it's parts. A lot of the current code deals with the configuration it wants to adjust. I can move that into postprocess_config. Currently, create_distro also makes the MANIFEST file, but that's not really its job either since its a "soft" file that I can generate from the build system. I can move the into post_create_distro. The final "Distribution created" message from run moves into pre_exit.

I promised more about run. It's a change that I haven't made to Module::Starter::App because I think it's probably one step too far, but I can give myself even more flexibility by letting the subclass determine the steps that I run so I can have even more hooks if I like:

# my new Module/Starter/App.pm
sub run {
    my $class = shift;

    my %config = $class->_config_read;

    %config = $class->_process_command_line( %config );

    eval "require $config{class};";
    print "Class is $config{class}\n";
    croak "Could not load starter class $config{class}: $@" if $@;

    $config{class}->import( @{$config{plugins}} );

    my $starter = $config{class}->new( %config );

    foreach my $step ( $starter->run_steps ) {
        eval { $starter->$step() } or ...;
    }

    return 1;
}

In Module::Starter::Simple I'd have a method to return the default steps to run:

sub run_steps { qw( 
    postprocess_config 
    pre_create_distro 
    create_distro 
    post_create_distro 
    pre_exit
    )
    };

Now, here's the really cool part of that design. The steps can come from outside the code:

# ~/.module-starter/config

run_steps: postprocess_config pre_create_distro create_distro post_create_distro pre_exit

Now I'm customizing the behavior from configuration instead of code. If I wanted to add another step, I could just put it in there (as long as the subclass implements it):

run_steps: ... commit_to_vcs pre_exit

If I wanted to go even farther, which I don't particularly care to do right now, I could shift responsibility for the configuration processing out of Module::Starter::App. It only handles it now because it needs to know the subclass name. The other work it handles, such as the inheritance (instead of roles) architecture, can move up into my subclass so I can have more control there too:

sub run {
    my $class = shift;

    my $starter_subclass = $class->_config_read;

    my $starter = $starter_subclass->new( %config );        

    foreach my $step ( $starter->run_steps ) {
        eval { $starter->$step() } or ...;
    }

    return 1;
}

This design pushes as much of the decision making and irreversible work to happen as late as possible. The only thing that run decides for me is the starting subclass name. It's just a way to kick off the process, which is the only thing it should do. new merely creates the new object. After that, each step does only their little job, minutely controllable, infinitely extendable, and quite malleable.

Integrating Module::Starter into Intermediate Perl, with fixes.

Working on a book about something is one of the best ways to discover issues with interfaces. When I have to explain some process and think about all the ways that things can go wrong so I can make the explanation as bulletproof as possible, all sorts of issues pop up.

I'm working on the distributions part of the next edition of Intermediate Perl. All of the h2xs stuff is being shoved into a couple of paragraphs and everything else now uses Module::Starter. I'm not a particular fan of the module (I have my own: Distribution::Cooker), but I do think it's the best thing to use if you don't know what you want to use.

I also want to integrate distributions more widely into the discussion of everything else, and take a very mild, test-driven development like approach. That way, distributions isn't some afterthought chapter at the end of the book like it is now. Part of that means that as we build up a distribution, I need to use Module::Starter to add a new module to an existing distribution.

Module::Starter::Smart can handle this mostly. It uses the Module::Starter templates to create the new module file and fix up the other files. Most of it works quite nicely. The problem shows up because Module::Starter::Smart relies on Module::Starter::Simple to handle recreating MANIFEST. In its create_MANIFEST, it takes in a list of files that it thinks it created and uses that to make MANIFEST.

For a working distribution, this doesn't quite work. I might have added files or excluded files with MANIFEST.SKIP. Since create_MANIFEST only considers the files that Module::Starter creates, it recreates MANIFEST with only the files it thnks it created. If I used a Module::Starter plug-in that changed the build file template so it changed the manifest target, none of that happens.

I could just re-build the MANIFEST, but I know to do that because I already know how all this works. For the readers of Intermediate Perl just beginning their education about Perl distributions, it's another step to explain and a lot to explain at that.

The real fix for any system like this is to never make more files than you need to create yourself. Once any build system gets to the point where it can generate files, like MANIFEST or META.yml, it's time to trust the system to make them for you.

This isn't a big deal. I've filed RT ticket 53330 for the problem and people are working on it.

In the meantime, I've created Module::Starter::AddModule as a Module::Starter::Smart subclass to provide a proper create_MANIFEST. I don't intend for Module::Starter::AddModule to stick around, but its create_MANIFEST can work its way upstream if people like. Until then, it's what I'm using for Intermediate Perl.

Submit a Perl talk for OSCON before February 1

The Call for Proposals for OSCON ends on February 1.

As part of the Perl track committee, I gave some guidance in what you might propose in "How to tell your Perl story at OSCON". There are many interesting things you could talk about, even if you don't think it's interesting.

I mentioned several categories your talk might fall into:

  • The Perl language itself, and how it works
  • Using Perl features to provide programmer capabilities
  • Using other technologies from Perl
  • The process of using Perl to get work done
  • A 5 minute lightning talk

Every proposal is judged both by a committee of subject matter experts as well as the entire OSCON program committee and the organizers. Take the time to let OSCON know why your talk is the best, and remember that some of the people judging it might not know who you are or why your really cool thing is important. You're also in competition with other Perl talks, so we need a reason to pick yours over the many other Perl talks for the limited space each track gets.

Good luck!

http://twitter.com/briandfoy_perl

I'll tweet at (http://twitter.com/briandfoy_perl) when I have something to say in 140 characters. I'll almost always use "Perl" when I post.

Putting does() in Intermediate Perl

I'm thinking about what a second edition of Intermediate Perl would look like. Of course, I would update it for Perl 5.12 (having been targeted at 5.8). Part of that is the new Perl 5.10 does() feature that replaces most uses of isa(). I want to come up with some good examples to show off does() and roles for the next edition. It's not as easy as I thought it would be.

I won't go into the history of does(): look for chromatic's various writings on that. In short, does() asks something it if has a set of behaviors. It's almost that interface stuff that Java has.

There's a problem constructing working examples though. Who gets to define the names of the behaviors? There is still the false cognate problem that does() was supposed to ameliorate. Two different interfaces give themselves the same name. It doesn't have to be for any good reason. Let's just leave it at "people suck at naming".

How does Java solve this? There's an authority mechanism to distinguish class names when it matters. Perl has a different authority mechanism, called PAUSE, but not really. You upload something and if you use that package name first, it effectively belongs to you, but only in terms of the filename. A programmer can easily replace a class, add or remove methods, or plain just use the same name for a completely different purpose. Is that CGI class for web stuff or animation, even if they both have the save method I'm interested in?

So, as I'm sitting here trying to come up with a good example to motivate does(), I'm back to isa(), the default behavior of does(), because the only authority we have to lock a set of behavior to a string is PAUSE, and even that isn't very authoritative.

Now, this doesn't matter if we are making our own system of classes and objects because we make ourselves the authority and we can give the roles any names that we like. We get to control all the dimensions for our own application.

Let's suppose, however, that as an attentive CPAN author I'm thinking how I can tell other code what roles my modules handle. This isn't application development. I make this little piece over here, someone makes some other little piece over there, and someone else puts our two little pieces together. Maybe I have some code that delegates some Log4perl method names to an internal Log4perl object. That would be quite a handy bit of knowledge at the higher level when it comes to debugging time. I could choose "Log::Log4perl" as the role name, but that's not quite right because I don't handle everything: just the logging methods (debug, warn, etc). I might use init_and_watch for something else, for example. What if my code could also dispatch to Log::Dispatch depending on the user configuration? Then I might claim I have the "Log::Dispatch" role.

But, does it matter if I have the Log4perl or Log::Dispatch role? Not really. I don't want to higher level to know how I do it, just that I can do it. I want them to know I have a "Logging" role, but not in the lumber sense, which also needs a warn method. What about the *::Any modules? Should they be the canonical role name? I made my little piece and claimed it handled some role, but another CPAN author supports the same set of behaviors yet gave it another name.

I think, to be really useful, you don't want to check that an object does a role. You really want to check that it does the part of the role that you are interested in. You know that eventually roles will conflict, and one of them has to win. Even if the combined roles can dispatch to both, you're going to want to choose. Checking does() and can() leaves room for error:

 if( $object->does( $role ) and
      $object->can( $method ) ) # it can, but is $method in $role?

You need to check both, in this fictional interface:

 $object->does( $role, $method );

Or maybe return a Role object that you can query:

 $object->does( $role )->can( $method );

In that case, if the role provided only part of its interface, you get the right answer when another role provided the same method. Not that we want to be able to have that, but you know it's going to happen eventually.

Indeed, does() has many advantages because it does a lot more than isa(), but it's merely the foundation of a good practice that we have yet to develop. It's going in the book as at least an isa() replacement, but other than its default behavior, we're missing a lot that would make this useful beyond that.

Effective Perl Programming master class at Frozen Perl

At Frozen Perl 2010 in Minneapolis, I'm teaching a new master class based on my latest book, dp/0321496949">Effective Perl Programming, 2nd Edition. Perl has changed quite a bit since Joseph Hall wrote the first edition over 10 years ago. Josh McAdams and I have added a lot of new information as well as updated the existing material. In the one-day class for intermediate Perl programmers, I'll cover selected topics from the book, including:
    /users/brian_d_foy/2010/01/index.html

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).