Making Module::Starter easier to subclass
I started with what I thought would be a simple
Module::Starter
subclass. I wanted to modify Module::Starter::Smart
, which knows how
to add a new module to an existing distribution, to properly handle an
existing MANIFEST and MANIFEST.SKIP. I thought I could just
override create_MANIFEST
(RT
53339). Once I
overrode that, however, I wanted to make a couple of additional
tweaks, things weren't so easy.
Although I'm using Module::Starter
as an example in this post, I
could just as easily use one of my own modules to illustrate the same
thing. I've just been hacking on Module::Starter
for the past week
so I can get it into shape for use in the next edition of
Intermediate Perl. I'm not a particular fan of the module, but its
the best place to start if you don't know you want to use something
else.
My problem started when I also wanted to handle some DWIMery in the
--dist
command-line parameter:
% module-starter --module=Pig --dist=DISTNAME
Since I wanted to add a module to an existing distribution, I
wanted to handle the common case of being in the the distribution
directory as I work. I shouldn't have to specify --dist
because
I can figure it out by looking at META.yml. If I don't specify
it, the current module-starter
creates a new distribution inside
my distribution. I tried to get around that by using .
, which is
just stupid:
% module-starter --module=Pig --dist=.
That doesn't work, and it shouldn't. The --dist
is for the
distribution name, not its location. The value in --dist
shows up as
an interpolated string in the module source for things such as the RT
Queue addresses. That's fine. I don't really want to have to type it
anyway, so I started looking how I could hook into the configuration
to fix it up before the magic happens.
However, the current setup (1.54) for Module::Starter
is not
flexible enough for this because it does a couple of things that are
typically no-no's in object-oriented design. As I said before, that's
not a big deal. It's open source and I can submit patches,
monkeypatch, or whatever I like.
The first problem shows up in the implementation of
Module::Starter::App
, which is the code behind module-starter
. The
first step in the process is a call to a fixed class:
# module-starter
Module::Starter::App->run;
That's a tough nut because you have to start somewhere. module-starter
could do some gymnastics to dynamically rebless it into a user-defined
class, but it doesn't need to. I have a better way that you'll see by
the end of this post. The real
problem is that run
does the same thing again:
# Module/Starter/App.pm
# lots of munging of %config
$config{class}->create_distro( %config );
That $config{class}
is a class name, Module::Simple::Starter
by default, or
whatever the last class in your plugin chain is (a different mess that
I'll ignore for this post).
Now, the next problem is create_distro
itself. It's a class method,
and once it runs, it constructs an object then works with that object:
sub create_distro {
my $class = shift;
my $self = $class->new( @_ );
# a lot more code
}
That's the only entry point that Module::Simple::App
gives me. It's
not that much of a problem, but it is messy. I have three options now:
extend
create_distro
like most subclasses would do, and call->SUPER::create_distro
extend
new
, but hope thatcreate_distro
handles whatever I changeoverride
create_distro
completely
None of these are attractive in this case, and I tried all three before I decided they were all wrong and require too much work:
Extending methods is fragile in
Module::Starter
because you have to live with a subclass chain that you don't control and might work at cross-purposes. Thecreate_distro
method does way too much work anyway, so proper sequencing of events and changes is difficult.Overriding
create_distro
means I have to duplicate a lot of code even if I only want a small change.
The design issue with almost any subclass revolves around three issues: how much can I control, when can I get control, and how much work do I have to do?
Well, I can control everything, but in the useless sense of control.
In my subclass I can override create_distro
and never call any
superclasses. That's not very useful because it fails on the economy
of work I have to do. I have the same amount of control as if I start
from scratch. That's not interesting control. I want organic control
that comes from what is already there and requires the least change
possible to meet my need. I'm content with most of what
Module::Starter
does, so I want to reuse as much of it as possible.
I don't have much control there.
The next problem is "When?". I have to wait until
Module::Simple::App
calls create_distro
. By the method name alone,
that's too late for what I want to do. I want to do some work before I
create the distribution. I can override create_distro
to provide a
suitable entry point, but it's the only method that module-starter
will call, so I have to turn create_distro
into the method that does
everything as if the process was a single step. It's misnamed as such
and actually useless since the extra layer of indirection adds no
benefit.
The process isn't a single step. There are five major steps in
Module::Starter
that a resonable subclass might want to control:
Read the configuration file to get the plugin chain
Read the command-line arguments
Adjust the configuration
Cook the templates and create the "hard" distribution files
Generate the "soft" files (the auto-generatable ones)
I want to hook into that Step 1 so I can use the configuration file,
but create_distro
starts me at step 4 (that's not entirely true
because create_distro
conflates Steps 3, 4, and 5, but Step 3 is
also partially handled by Module::Starter::App
).
Once I gave up on Module::Starter
's current implementation, things
got much easier. I changed Module::Starter::App
to give me more
control, but also with (I think) preserving the same high-level
behavior so old plugins still work. I make it call new
then a series
of new hooks. create_distro
is now an instance method and I can
handle setup tasks before I get to it. This is the meat of the patch I
submitted as RT
53539:
# my new Module/Starter/App.pm
sub run {
my $class = shift;
my %config = $class->_config_read;
%config = $class->_process_command_line( %config );
eval "require $config{class};";
print "Class is $config{class}\n";
croak "Could not load starter class $config{class}: $@" if $@;
$config{class}->import( @{$config{plugins}} );
my $starter = $config{class}->new( %config );
$starter->postprocess_config;
$starter->pre_create_distro;
$starter->create_distro;
$starter->post_create_distro;
$starter->pre_exit;
return 1;
}
A lot of the run
implementation disappeared into either
_config_read
or _process_command_line
. I should probably refactor
the subclass discovery into its own method too. This is close to what
any run
subroutine should look like: almost every decision is made
in some other method. I don't have to change run
to customize it.
More on that in a moment.
I made a corresponding small change in Module::Starter::Simple
to
handle its change from a class to instance method, although still work
as a class method. If I call it with the old, class method interface,
I create an object, just as before. If I call it with the new instance
method interface, it just skips the call to new
:
sub create_distro {
my $either = shift;
$either = $either->new( @_ ) unless( ref $either );
my $self = $either;
...
}
I should probably add some sort of deprecation warning in there too.
I added several more method calls, but in Module::Starter::Simple
they are just stubs available to subclass that want to use them:
sub postprocess_config { 1 }
sub pre_create_distro { 1 }
sub post_create_distro { 1 }
sub pre_exit { 1 }
I can go one step farther though. Now that I have a process that has
distinct steps, I can decompose create_distro
into it's parts. A lot
of the current code deals with the configuration it wants to adjust. I
can move that into postprocess_config
. Currently, create_distro
also makes the MANIFEST file, but that's not really its job either
since its a "soft" file that I can generate from the build system. I
can move the into post_create_distro
. The final "Distribution created"
message from run
moves into pre_exit
.
I promised more about run
. It's a change that I haven't made to
Module::Starter::App
because I think it's probably one step too far,
but I can give myself even more flexibility by letting the subclass
determine the steps that I run so I can have even more hooks if I like:
# my new Module/Starter/App.pm
sub run {
my $class = shift;
my %config = $class->_config_read;
%config = $class->_process_command_line( %config );
eval "require $config{class};";
print "Class is $config{class}\n";
croak "Could not load starter class $config{class}: $@" if $@;
$config{class}->import( @{$config{plugins}} );
my $starter = $config{class}->new( %config );
foreach my $step ( $starter->run_steps ) {
eval { $starter->$step() } or ...;
}
return 1;
}
In Module::Starter::Simple
I'd have a method to return the default
steps to run:
sub run_steps { qw(
postprocess_config
pre_create_distro
create_distro
post_create_distro
pre_exit
)
};
Now, here's the really cool part of that design. The steps can come from outside the code:
# ~/.module-starter/config
run_steps: postprocess_config pre_create_distro create_distro post_create_distro pre_exit
Now I'm customizing the behavior from configuration instead of code. If I wanted to add another step, I could just put it in there (as long as the subclass implements it):
run_steps: ... commit_to_vcs pre_exit
If I wanted to go even farther, which I don't particularly care to do right
now, I could shift responsibility for the configuration processing out of
Module::Starter::App
. It only handles it now because it needs to know the
subclass name. The other work it handles, such as the inheritance (instead
of roles) architecture, can move up into my subclass so I can have more control
there too:
sub run {
my $class = shift;
my $starter_subclass = $class->_config_read;
my $starter = $starter_subclass->new( %config );
foreach my $step ( $starter->run_steps ) {
eval { $starter->$step() } or ...;
}
return 1;
}
This design pushes as much of the decision making and irreversible
work to happen as late as possible. The only thing that run
decides
for me is the starting subclass name. It's just a way to kick off the
process, which is the only thing it should do. new
merely creates
the new object. After that, each step does only their little job,
minutely controllable, infinitely extendable, and quite malleable.
First overwriting the variable, but only in some circumstances, then copying the result – that looks icky to me. It seems like the perfect occasion for a ternary:
Hey cool, you’re reinventing
make
. :-)Anyway – very nice work. I tried to hack on Module::Starter quite a while ago and found it too hard to achieve what I wanted. I don’t remember what it was, but it was something small enough that it wasn’t worth the bother, so I never did.
Hey Brian!
Excellent post. While reading the issues in the beginning I was thinking "how about a hook system to do the..." - and kept reading that it's exactly what you thought of.
I'm dedicating this week to mostly Module::Starter stuff. I hope to get to your patch today even and push it in the trunk.
I like the idea of the steps system as well, and I'm game on adding it too. The only thing I'm worried about is breaking other modules but I don't think that will be the case.
Either way, I think being able to add a module to an existing distribution should be possible in Module::Starter, and I want to add it. Also, an attribute for controlling which folder it works in, so you could do
module-starter --dir . --dist Dist::Name
or something like that.