Making Module::Starter easier to subclass

By brian d foy on January 10, 2010 6:00 PM

I started with what I thought would be a simple Module::Starter subclass. I wanted to modify Module::Starter::Smart, which knows how to add a new module to an existing distribution, to properly handle an existing MANIFEST and MANIFEST.SKIP. I thought I could just override create_MANIFEST (RT 53339). Once I overrode that, however, I wanted to make a couple of additional tweaks, things weren't so easy.

Although I'm using Module::Starter as an example in this post, I could just as easily use one of my own modules to illustrate the same thing. I've just been hacking on Module::Starter for the past week so I can get it into shape for use in the next edition of Intermediate Perl. I'm not a particular fan of the module, but its the best place to start if you don't know you want to use something else.

My problem started when I also wanted to handle some DWIMery in the --dist command-line parameter:

% module-starter --module=Pig --dist=DISTNAME

Since I wanted to add a module to an existing distribution, I wanted to handle the common case of being in the the distribution directory as I work. I shouldn't have to specify --dist because I can figure it out by looking at META.yml. If I don't specify it, the current module-starter creates a new distribution inside my distribution. I tried to get around that by using ., which is just stupid:

% module-starter --module=Pig --dist=.

That doesn't work, and it shouldn't. The --dist is for the distribution name, not its location. The value in --dist shows up as an interpolated string in the module source for things such as the RT Queue addresses. That's fine. I don't really want to have to type it anyway, so I started looking how I could hook into the configuration to fix it up before the magic happens.

However, the current setup (1.54) for Module::Starter is not flexible enough for this because it does a couple of things that are typically no-no's in object-oriented design. As I said before, that's not a big deal. It's open source and I can submit patches, monkeypatch, or whatever I like.

The first problem shows up in the implementation of Module::Starter::App, which is the code behind module-starter. The first step in the process is a call to a fixed class:

# module-starter
Module::Starter::App->run;

That's a tough nut because you have to start somewhere. module-starter could do some gymnastics to dynamically rebless it into a user-defined class, but it doesn't need to. I have a better way that you'll see by the end of this post. The real problem is that run does the same thing again:

# Module/Starter/App.pm
# lots of munging of %config
$config{class}->create_distro( %config );

That $config{class} is a class name, Module::Simple::Starter by default, or whatever the last class in your plugin chain is (a different mess that I'll ignore for this post).

Now, the next problem is create_distro itself. It's a class method, and once it runs, it constructs an object then works with that object:

sub create_distro {
    my $class = shift;

    my $self = $class->new( @_ );

    # a lot more code
}

That's the only entry point that Module::Simple::App gives me. It's not that much of a problem, but it is messy. I have three options now:

extend create_distro like most subclasses would do, and call ->SUPER::create_distro
extend new, but hope that create_distro handles whatever I change
override create_distro completely

None of these are attractive in this case, and I tried all three before I decided they were all wrong and require too much work:

Extending methods is fragile in Module::Starter because you have to live with a subclass chain that you don't control and might work at cross-purposes. The create_distro method does way too much work anyway, so proper sequencing of events and changes is difficult.
Overriding create_distro means I have to duplicate a lot of code even if I only want a small change.

The design issue with almost any subclass revolves around three issues: how much can I control, when can I get control, and how much work do I have to do?

Well, I can control everything, but in the useless sense of control. In my subclass I can override create_distro and never call any superclasses. That's not very useful because it fails on the economy of work I have to do. I have the same amount of control as if I start from scratch. That's not interesting control. I want organic control that comes from what is already there and requires the least change possible to meet my need. I'm content with most of what Module::Starter does, so I want to reuse as much of it as possible. I don't have much control there.

The next problem is "When?". I have to wait until Module::Simple::App calls create_distro. By the method name alone, that's too late for what I want to do. I want to do some work before I create the distribution. I can override create_distro to provide a suitable entry point, but it's the only method that module-starter will call, so I have to turn create_distro into the method that does everything as if the process was a single step. It's misnamed as such and actually useless since the extra layer of indirection adds no benefit.

The process isn't a single step. There are five major steps in Module::Starter that a resonable subclass might want to control:

Read the configuration file to get the plugin chain
Read the command-line arguments
Adjust the configuration
Cook the templates and create the "hard" distribution files
Generate the "soft" files (the auto-generatable ones)

I want to hook into that Step 1 so I can use the configuration file, but create_distro starts me at step 4 (that's not entirely true because create_distro conflates Steps 3, 4, and 5, but Step 3 is also partially handled by Module::Starter::App).

Once I gave up on Module::Starter's current implementation, things got much easier. I changed Module::Starter::App to give me more control, but also with (I think) preserving the same high-level behavior so old plugins still work. I make it call new then a series of new hooks. create_distro is now an instance method and I can handle setup tasks before I get to it. This is the meat of the patch I submitted as RT 53539:

# my new Module/Starter/App.pm
sub run {
    my $class = shift;

    my %config = $class->_config_read;

    %config = $class->_process_command_line( %config );

    eval "require $config{class};";
    print "Class is $config{class}\n";
    croak "Could not load starter class $config{class}: $@" if $@;

    $config{class}->import( @{$config{plugins}} );

    my $starter = $config{class}->new( %config );
    $starter->postprocess_config;
    $starter->pre_create_distro;
    $starter->create_distro;
    $starter->post_create_distro;
    $starter->pre_exit;

    return 1;
}

A lot of the run implementation disappeared into either _config_read or _process_command_line. I should probably refactor the subclass discovery into its own method too. This is close to what any run subroutine should look like: almost every decision is made in some other method. I don't have to change run to customize it. More on that in a moment.

I made a corresponding small change in Module::Starter::Simple to handle its change from a class to instance method, although still work as a class method. If I call it with the old, class method interface, I create an object, just as before. If I call it with the new instance method interface, it just skips the call to new:

sub create_distro {
    my $either = shift;

    $either = $either->new( @_ ) unless( ref $either );
    my $self = $either;

    ...
}

I should probably add some sort of deprecation warning in there too.

I added several more method calls, but in Module::Starter::Simple they are just stubs available to subclass that want to use them:

sub postprocess_config { 1 }
sub pre_create_distro  { 1 }
sub post_create_distro { 1 }
sub pre_exit           { 1 }

I can go one step farther though. Now that I have a process that has distinct steps, I can decompose create_distro into it's parts. A lot of the current code deals with the configuration it wants to adjust. I can move that into postprocess_config. Currently, create_distro also makes the MANIFEST file, but that's not really its job either since its a "soft" file that I can generate from the build system. I can move the into post_create_distro. The final "Distribution created" message from run moves into pre_exit.

I promised more about run. It's a change that I haven't made to Module::Starter::App because I think it's probably one step too far, but I can give myself even more flexibility by letting the subclass determine the steps that I run so I can have even more hooks if I like:

# my new Module/Starter/App.pm
sub run {
    my $class = shift;

    my %config = $class->_config_read;

    %config = $class->_process_command_line( %config );

    eval "require $config{class};";
    print "Class is $config{class}\n";
    croak "Could not load starter class $config{class}: $@" if $@;

    $config{class}->import( @{$config{plugins}} );

    my $starter = $config{class}->new( %config );

    foreach my $step ( $starter->run_steps ) {
        eval { $starter->$step() } or ...;
    }

    return 1;
}

In Module::Starter::Simple I'd have a method to return the default steps to run:

sub run_steps { qw( 
    postprocess_config 
    pre_create_distro 
    create_distro 
    post_create_distro 
    pre_exit
    )
    };

Now, here's the really cool part of that design. The steps can come from outside the code:

# ~/.module-starter/config

run_steps: postprocess_config pre_create_distro create_distro post_create_distro pre_exit

Now I'm customizing the behavior from configuration instead of code. If I wanted to add another step, I could just put it in there (as long as the subclass implements it):

run_steps: ... commit_to_vcs pre_exit

If I wanted to go even farther, which I don't particularly care to do right now, I could shift responsibility for the configuration processing out of Module::Starter::App. It only handles it now because it needs to know the subclass name. The other work it handles, such as the inheritance (instead of roles) architecture, can move up into my subclass so I can have more control there too:

sub run {
    my $class = shift;

    my $starter_subclass = $class->_config_read;

    my $starter = $starter_subclass->new( %config );        

    foreach my $step ( $starter->run_steps ) {
        eval { $starter->$step() } or ...;
    }

    return 1;
}

This design pushes as much of the decision making and irreversible work to happen as late as possible. The only thing that run decides for me is the starting subclass name. It's just a way to kick off the process, which is the only thing it should do. new merely creates the new object. After that, each step does only their little job, minutely controllable, infinitely extendable, and quite malleable.

2 comments

Tagged as:

design, module-starter, perl, subclassing

2 Comments

Aristotle | January 11, 2010 5:42 AM | Reply

my $either = shift;
$either = $either->new( @_ ) unless( ref $either );
my $self = $either;

First overwriting the variable, but only in some circumstances, then copying the result – that looks icky to me. It seems like the perfect occasion for a ternary:

my $either = shift;
my $self = ref $either ? $either : $either->new( @_ );

Now, here’s the really cool part of that design. The steps can come from outside the code:

Hey cool, you’re reinventing make. :-)

Anyway – very nice work. I tried to hack on Module::Starter quite a while ago and found it too hard to achieve what I wanted. I don’t remember what it was, but it was something small enough that it wasn’t worth the bother, so I never did.

Sawyer X | January 11, 2010 9:24 AM | Reply

Hey Brian!

Excellent post. While reading the issues in the beginning I was thinking "how about a hook system to do the..." - and kept reading that it's exactly what you thought of.

I'm dedicating this week to mostly Module::Starter stuff. I hope to get to your patch today even and push it in the trunk.

I like the idea of the steps system as well, and I'm game on adding it too. The only thing I'm worried about is breaking other modules but I don't think that will be the case.

Either way, I think being able to add a module to an existing distribution should be possible in Module::Starter, and I want to add it. Also, an attribute for controlling which folder it works in, so you could do module-starter --dir . --dist Dist::Name or something like that.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About brian d foy

I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).

More info »

brian d foy