Moo 2.0

The next stable release of Moo will be version 2.0, and will include some incompatible changes. These changes should affect a pretty small number of modules, and may help point out flaws in the existing code.

The most important change in Moo 2.0 is that it will no longer be applying fatal warnings to classes using it. As Moo has grown to be more widely used on CPAN, it has become obvious that applying fatal warnings is usually unexpected or undesired by other authors, resulting in things like Moo::Lax, or people just avoiding Moo entirely. And authors who prefer fatal warnings can easily apply them to their own code.

Moo 2.0 will also detect a number of cases where people apply modifiers to constructors, or attempt to add attributes to classes that have already been instantiated. These cases never worked correctly, but they will now issue errors.

Another smaller change is that classes without attributes would previously store all parameters to ->new in the object. This was a bug, but had backward compatibility concerns.

I've done testing on all modules depending on Moo, and only found a small number that these changes caused issues with. In most cases, it indicated a bug in the module that hadn't been caught yet.

There is a beta version of Moo 2.0 available on CPAN now as Moo 1.999_001. If you have code that depends on Moo, please test it with this new version.

Travis-CI Helpers for Perl

I deal with a lot of modules that promise backwards compatibility with older versions of perl, usually back to perl 5.8.1. Since I don't regularly use perl versions that old when developing, accidentally introducing incompatibilities is always a risk. Having a continuous integration system check this for me makes it much easier to catch mistakes like this before they get released into the wild.

Travis CI is a very useful continuous integration service that is free for any public repositories on GitHub. There are issues with using Travis CI for the kind of testing I need though. First, it only provides the last revision of each perl series. Especially in the perl 5.8 and 5.10 series, there are substantial enough differences between them that testing only the latest isn't adequate. Additionally, some of the testing needs to be done on perls built with threading, which isn't included on most of the versions available on Travis. It also is sometimes useful to test without any additional modules pre-installed like Travis does.

There is a solution for this though. Perl can be built directly on the Travis test boxes before running the tests. Any arbitrary perl version can be built, including blead (perl from git) or new stable releases that haven't been included on Travis yet (like was the case with 5.20 for a few months).

Building new perl versions was what originally inspired me to begin work on my Travis helper scripts. Since then, they have expanded to include a number of other functions to simplify testing perl modules on Travis.

The Simple Version

The helpers can be used individually to customize the building and testing process, but for most distributions the automatic mode will work. A simple .travis.yml using my helper scripts would look like this:

language: perl
perl:
  - "5.8"                     # normal pre-installed perl
  - "5.8.4"                   # installs perl 5.8.4
  - "5.8.4-thr"               # installs perl 5.8.4 with threading
  - "5.20"                    # installs latest perl 5.20 (if not already available)
  - "blead"                   # install perl from git
before_install:
  - git clone git://github.com/travis-perl/helpers ~/travis-perl-helpers
  - source ~/travis-perl-helpers/init --auto

This includes most of the features and will work for most distributions. It includes building perl where needed, installing prerequisites, and will work with dists built using Dist::Zilla, ExtUtils::MakeMaker, Module::Build, or Module::Install.

The --auto flag means that the testing process is roughly equivalent to the following Travis config.

language: perl
perl:
  - "5.8"                     # normal pre-installed perl
  - "5.8.4"                   # installs perl 5.8.4
  - "5.8.4-thr"               # installs perl 5.8.4 with threading
  - "5.20"                    # installs latest perl 5.20 (if not already available)
  - "blead"                   # install perl from git
before_install:
  - git clone git://github.com/travis-perl/helpers ~/travis-perl-helpers
  - source ~/travis-perl-helpers/init
  - build-perl
  - perl -V
  - build-dist
  - cd $BUILD_DIR             # $BUILD_DIR is set by the build-dist command
install:
  - cpan-install --deps       # installs prereqs, including recommends
  - cpan-install --coverage   # installs coverage prereqs, if enabled
before_script:
  - coverage-setup
script:
  - perl Makefile.PL          # or Build.PL if it exists
  - make                      # or ./Build
  - prove -l -s -j$(test-jobs) $(test-files)
after_success:
  - coverage-report

While the automatic mode supports most of the features the helpers provide, it isn't meant to be used with custom build steps. If any customization of the build steps is needed, the automatic mode shouldn't be used.

Perl Building - build-perl

The first important helper function is build-perl. It takes the requested perl version from the build matrix and either downloads or builds it for you if it doesn't exist. So for example, if 5.16 is requested, Travis will already have it available and nothing will be done. But if 5.16.0 is requested, a fresh version of perl will be built. If 5.8.8 is requested, a pre-built copy of perl 5.8.8 will be downloaded, as it's a commonly tested version so I've pre-built it. Building perl generally takes around 4 minutes on Travis, so these pre-built copies can significantly speed up small test suites.

Build flags can also be added to the versions. 5.8.5-thr will build a version of perl including support for threads. 5.8.5-dbg will include debugging support. And 5.16-thr will build the latest 5.16 release and include support for threads.

If blead is requested, perl will be built from git. This is helpful to see if your module will be impacted by future changes to perl, but as blead is not guaranteed stable it should usually be included in Travis's allow_failures section.

Pre-installed Modules - local-lib

When the helper scripts build or download a perl version, they don't have any extra modules pre-installed. The default Travis builds all include a set of prerequisites pre-installed. Both cases can be useful for different situations. In some cases, you want to that your prerequisite installation works properly, or that your module works with an older version of a core module. But installing all of the prerequisites every time can delay testing by a significant amount.

To help with this, each pre-built copy of perl also has a set of pre-built local::lib directories that can be switched to. These can be used by adding them directly to the build matrix, attaching them to the perl version like 5.10.1@moose. The moose pre-built includes Moose and Moo. If not using a pre-built perl, the modules in the named local::lib will be installed.

The full list of pre-built local::libs and the libraries in them can be seen in the local-libs.txt file.

Distribution Building - build-dist

There are a variety of tools used for distribution building. Manually writing a Makefile.PL is one, but other options include Module::Build, Module::Install, or Dist::Zilla. While tests can often be performed directly against the files in the repository without building, this won't include any of the extra checks done by or generated by the dist building tool. It also can complicate the process of finding prerequisites.

The approach the helpers recommend is first generating a full dist like would be uploaded to CPAN, then testing against that. Because the distribution building tool often won't work on all of the perl versions you wish to test against, it's helpful to use a different (newer) version of perl than the tests are run with.

This is what the build-dist helper does. It uses the latest pre-built version of perl to generate a distribution directory, automatically installing any modules needed. It then sets the BUILD_DIR environment variable to the location of the built distribution.

Prerequisite Installation - cpan-install

For most cases, prerequisite installation could be handled by cpanm, but the cpan-install helper provides a few niceties. It provides more helpful output than cpanm in the event of a failure, but is still concise in the common case. It also tweaks the set of modules to be installed. The developer prerequisites and recommended modules of the distribution being tested will be installed, but not those of its prerequisites.

It also includes better compatibility with ancient versions of perl.

Coverage Reporting

Setting up coverage reporting in Travis is relatively simple. You just need to install the Devel::Cover module and run the cover command appropriately. But coverage reporting slows down testing substantially and can also prevent some tests from running (such as those using threads). So it's useful to limit coverage testing to only some of the perls you are testing with. With that in mind, the helper scripts include several coverage related commands that are no-ops unless the COVERAGE environment variable is set.

Running the Tests

For running the actual tests, the helpers do very little. It's recommended to use the standard prove command, with whatever options are wanted.

There are a few helpers that can be used with prove though. If you want to run tests in parallel, the test-jobs returns a recommended number of processes to use. The number is one more than the number CPUs available. It also will always return 1 if COVERAGE is enabled, since Devel::Cover is currently buggy when used with parallel testing.

The test-files returns all of the test scripts to run. This is generated by searching for .t files recursively in the t and xt directories. However, if the AUTHOR_TESTING environment variable is set to 0, it will only return files in t. It can also help with very slow test runs. If the TEST_PARTITION and `TEST_PARTITIONS environment variables are set, it will return only a subset of the tests. This allows you to split the tests across multiple Travis builds in parallel, making the full test run take less time.

Bits and Pieces

An important feature of the helpers is that they can all be used independently of each other. So if perl building is the only feature needed, the rest of the helpers can be ignored.

Overall, having these helpers has allowed me to set up testing easier for a variety of different projects, and allowed me to expand the versions of perl tests. They have been used to add perl 5.8 and blead testing to Moose, and perl 5.6 testing to Moo and ExtUtils::MakeMaker.

Devel::Confess - Include stack traces with all errors/warnings

Edit: Since writing this, I've decided on a proper name. Devel::Confess is the name for this module going forward. Carp::Always::EvenObjects exists now only as a wrapper around Devel::Confess.

Carp::Always is a very useful module that will force all errors and warnings to include a full stack trace to help with debugging. However, it has some limitations. If an exception object is thrown rather than a string, the stack trace can't simply be appended to it. die, Carp, and Carp::Always just pass the object through unmodified. Some exception systems include stack traces in their objects, but for those that don't, this hurts the ability to debug. As more libraries use exception objects, this becomes more problematic (e.g. autodie).

With that in mind, I've written Carp::Always::EvenObjects. It works similarly to Carp::Always, but will also attach stack traces to objects. This is done by re-blessing the object into a subclass that knows how to include the stack trace when the object is stringified. Most exception systems the same as the originals.

It will also attach stack traces to plain non-object refs, although their use as exceptions is rather rare.

Normally you would use the module on the command line, as:

perl -MCarp::Always::EvenObjects script.pl

As a bonus, since the name is rather long, the dist includes the module Devel::Confess as an alias, allowing you to use the shorter:

perl -MDevel::Confess script.pl

or even

perl -d:Confess script.pl

Using system or exec safely on Windows

Passing a list of arguments to another program on Windows in perl is much more complicated than it should be. There are several different issues that combine that lead to this.

(mostly copied from a post I made on PerlMonks)

First is that argument lists are always passed as a single string in Windows, as opposed to arrays like on other systems. This is less of a problem than it appears, because 95% of programs use the same rules for parsing that string into an array. Roughly speaking, the rules are that arguments can be quoted with double quotes, and backslashes can escape any character.

The second issue is that cmd.exe uses different quoting rules than the normal parsing routine. It uses a caret as the escape character instead of backslash.

The result of this is that you can't create a string that will be treated the same for both of these cases. This becomes a larger problem, because perl switches between using cmd.exe vs calling directly based on if they have shell meta-characters in them. And that involves a third, different set of quoting rules. There isn't any good way to check which way perl is going to treat a command without reimplementing the code to detect them that exists inside perl. So here is a routine that will quote arguments correctly to use with system on Windows:

sub quote_list {
    my (@args) = @_;

    my $args = join ' ', map { quote_literal($_) } @args;

    if (_has_shell_metachars($args)) {
        # cmd.exe treats quotes differently from standard
        # argument parsing. just escape everything using ^.
        $args =~ s/([()%!^"<>&|])/^$1/g;
    }
    return $args;
}

sub quote_literal {
    my ($text) = @_;

    # basic argument quoting.  uses backslashes and quotes to escape
    # everything.
    if ($text ne '' && $text !~ /[ \t\n\v"]/) {
        # no quoting needed
    }
    else {
        my @text = split '', $text;
        $text = q{"};
        for (my $i = 0; ; $i++) {
            my $bs_count = 0;
            while ( $i < @text && $text[$i] eq "\\" ) {
                $i++;
                $bs_count++;
            }
            if ($i > $#text) {
                $text .= "\\" x ($bs_count * 2);
                last;
            }
            elsif ($text[$i] eq q{"}) {
                $text .= "\\" x ($bs_count * 2 + 1);
            }
            else {
                $text .= "\\" x $bs_count;
            }
            $text .= $text[$i];
        }
        $text .= q{"};
    }

    return $text;
}

# direct port of code from win32.c
sub _has_shell_metachars {
    my $string = shift;
    my $inquote = 0;
    my $quote = '';

    my @string = split '', $string;
    for my $char (@string) {
        if ($char eq q{%}) {
            return 1;
        }
        elsif ($char eq q{'} || $char eq q{"}) {
            if ($inquote) {
                if ($char eq $quote) {
                    $inquote = 0;
                    $quote = '';
                }
            }
            else {
                $quote = $char;
                $inquote++;
            }
        }
        elsif ($char eq q{<} || $char eq q{>} || $char eq q{|}) {
            if ( ! $inquote) {
                return 1;
            }
        }
    }
    return;
}

The information about the quoting rules on Windows is from the article Everyone quotes command line arguments the wrong way. I attempted to use this to improve ExtUtils::MakeMaker's quoting, but that also has to deal with Makefile quoting rules. Additionally, other parts of the code (or at least tests) assume that you can generate a string and have it work both when passed to system and when placed in a Makefile. I almost never use perl on Windows, so I eventually gave up on the effort.

Converting Complex SVN Repositories to Git - Part 4

Cleaning up and simplifying merges

After the previous steps, the git repository has an accurate history of what was done to the SVN repository. It is a direct translation though, and shows more the process and tools that were used, rather than developer intent. I proceeded to simplify how the merges were recorded to eliminate the convoluted mess that existed and make the history usable.

Two main classes of these problems existed. There were branches were merged one commit at a time, as that was one way of preserving the history in SVN. The other case was trunk being merged into a branch, and immediately merging that back into trunk. Some other issues match up with those two merge styles and the same cleanup will apply to them.

Here is a section of the history of the 'DBIx-Class-resultset' branch being merged, one commit at a time. Obviously not ideal, but you can mostly tell what is happening.

resultset-ugly.png

The merge of the 'DBIx-Class-current' branch was somewhat less straightforward. current-ugly-end.png

...

current-ugly-middle.png

...

current-ugly-start.png

This smaller example of the 'resultset_cleanup' branch helps show how these can be dealt with.

resultset_cleanup-before.png

If we search for merges, starting from the earliest point in the repository history, we will find the commit noted as 4. We don't want to remove the record of this branch being merged, so initially we will leave it alone. The next merge we find however, 1, makes the first redundant. There is no need to maintain the first merge now that we know that this one exists. This process continues forward, eventually resulting in a single merge commit for the branch.

The code for this is in 43.graft-merges-simplified.

# get a list of all of the merge commits and their parent commits, space separated
my @merges = `git log --all --merges --pretty=format:'%H %P'`;
# to record all of the commits we intend to alter
my %altered;
# to record all of the merges we've seen so far
my %merges;
# start at the earliest point
for my $merge ( reverse @merges ) {
    chomp $merge;
    my ($commit, @parents) = split / /, $merge;
    $merges{$commit} = \@parents;
    # checking our merge [1]
    # this repo only contains merges with two parents
    my ( $left_parent, $right_parent ) = @parents;
    # check if our first parent [3] is a merge
    if ( my $left_grandparents = $merges{ $left_parent } ) {
        # find the grandparent [4] on the opposite side of the merge [2]
        my $right_grandparent
            = `git show -s --pretty='format:%P' $right_parent | cut -d' ' -f1`;
        chomp $right_grandparent;
        # if it is the same as the grandparent ([4] again) on the left side
        if ($right_grandparent eq $left_grandparents->[1]) {
            # we know we want to simplify this merge
            $altered{$commit}++;
            # switch the left parent (was [2]) to the left grandparent [5]
            $parents[0] = $left_grandparents->[0];
            # our left parent shouldn't be part of the history anymore,
            #   so we don't want to match it
            delete $merges{ $left_parent };
            # nor do we need to change it
            delete $altered{ $left_parent };
        }
    }
}

# many of these merges exist only because they were calculated in previous steps
# we don't want duplicate grafts, so we simple comment out the old ones.
my $regex = '(?:' . (join '|', keys %altered) . ')';
system "perl -i -pe's/^($regex )/# \$1/' $GIT_DIR/info/grafts";

# record the grafts
open my $fh, '>>', "$GIT_DIR/info/grafts";
print { $fh } "# Simplified merges\n";
for my $commit ( keys %altered ) {
    print { $fh } join(q{ }, $commit, @{ $merges{$commit} }) . "\n";
}
close $fh;

# we're modifying these merge commits.  whatever their commit
# messages were initially won't be accurate anymore.
# later, when we rewrite the commit messages, we want to just
# record these as branch merges.
# this just keeps track of which commits we want to simplify the
# commit messages in this manner.

use Data::Dumper;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Sortkeys = 1;

@altered{ keys %$simplified_merges } = values %$simplified_merges;
open $fh, '>', "$BASE_DIR/cache/simplified-merges.pl";
print { $fh } Dumper(\%altered);
close $fh;

The end result is obviously much nicer.

resultset_cleanup-after.png

It turned out that while these calculations caught the majority of the cases, a couple complex, ugly cases were missed. The 'DBIx-Class-current' case was one of these. Rather than spend the extra effort to find an additional strategy to automatically detect such cases (if it was even possible), I manually figured out the best way to record the merges and put them in the 42.graft-merges-simplified-manual file.

Here we see a merge into a branch, followed immediately by a merge into trunk.

rsrc_in_storage-before.png

Another case that makes the history harder to follow. And while this example is relatively straightforward, cleaning up this type of merge helps in much uglier cases as well. The process for simplifying these merges may eliminate the commits our branches are referring to, but we don't have any need to maintain the branches that have been merged, so we delete them here (46.delete-merged-branches, the same script as 60.delete-merged-branches).

The 47.graft-merges-redundant script simplifies these. It follows a similar structure to the previous simplification script.

my @merges = `git log --all --merges --pretty=format:'%H %P'`;
my %altered;
my %merges;
for my $merge ( reverse @merges ) {
    chomp $merge;
    my ($commit, @parents) = split / /, $merge;
    my $f;
    # for each merge [1]
    $merges{$commit} = \@parents;
    # check each parent [2] in turn ([3] will be checked first, but fail
    #   a later test)
    PARENT: for my $p ( 0 .. 1 ) {
        my $parent = $parents[ $p ];
        # check against the other parent [3]
        my $check_ancest = $parents[ 1 - $p ];
        # we only care if it is merge
        my $ancest = $merges{ $check_ancest } || next;

        ANCEST: for my $c ( 0 .. 1 ) {
            # if the first parent [3] is also a parent of the second parent [2]
            if ($parent eq $ancest->[ $c ]) {
                $altered{$commit}++;
                # we don't need the current second parent [2], so switch
                # it to that commit's other parent [4]
                $parents[1 - $p] = $ancest->[1 - $c];
                # don't match or change the commit we are clipping out
                delete $merges{ $check_ancest };
                delete $altered{ $check_ancest };
                # and skip to the next commit
                last PARENT;
            }
        }
    }
}

The redundant merge is now gone.

rsrc_in_storage-after.png

The history simplification is now basically complete. Instead of the convoluted mess that resulted from a direct translation of the SVN repository, it now has a mostly understandable history showing what the developers intended, rather that the exact method they used to do so. All that is left to do is clean up the commit messages and attribution, fix the tags, and a few other minor cleanups.

Next: Commit message and other final cleanups, and baking in grafts