Travis-CI Helpers for Perl

I deal with a lot of modules that promise backwards compatibility with older versions of perl, usually back to perl 5.8.1. Since I don't regularly use perl versions that old when developing, accidentally introducing incompatibilities is always a risk. Having a continuous integration system check this for me makes it much easier to catch mistakes like this before they get released into the wild.

Travis CI is a very useful continuous integration service that is free for any public repositories on GitHub. There are issues with using Travis CI for the kind of testing I need though. First, it only provides the last revision of each perl series. Especially in the perl 5.8 and 5.10 series, there are substantial enough differences between them that testing only the latest isn't adequate. Additionally, some of the testing needs to be done on perls built with threading, which isn't included on most of the versions available on Travis. It also is sometimes useful to test without any additional modules pre-installed like Travis does.

There is a solution for this though. Perl can be built directly on the Travis test boxes before running the tests. Any arbitrary perl version can be built, including blead (perl from git) or new stable releases that haven't been included on Travis yet (like was the case with 5.20 for a few months).

Building new perl versions was what originally inspired me to begin work on my Travis helper scripts. Since then, they have expanded to include a number of other functions to simplify testing perl modules on Travis.

The Simple Version

The helpers can be used individually to customize the building and testing process, but for most distributions the automatic mode will work. A simple .travis.yml using my helper scripts would look like this:

language: perl
perl:
  - "5.8"                     # normal pre-installed perl
  - "5.8.4"                   # installs perl 5.8.4
  - "5.8.4-thr"               # installs perl 5.8.4 with threading
  - "5.20"                    # installs latest perl 5.20 (if not already available)
  - "blead"                   # install perl from git
before_install:
  - git clone git://github.com/travis-perl/helpers ~/travis-perl-helpers
  - source ~/travis-perl-helpers/init --auto

This includes most of the features and will work for most distributions. It includes building perl where needed, installing prerequisites, and will work with dists built using Dist::Zilla, ExtUtils::MakeMaker, Module::Build, or Module::Install.

The --auto flag means that the testing process is roughly equivalent to the following Travis config.

language: perl
perl:
  - "5.8"                     # normal pre-installed perl
  - "5.8.4"                   # installs perl 5.8.4
  - "5.8.4-thr"               # installs perl 5.8.4 with threading
  - "5.20"                    # installs latest perl 5.20 (if not already available)
  - "blead"                   # install perl from git
before_install:
  - git clone git://github.com/travis-perl/helpers ~/travis-perl-helpers
  - source ~/travis-perl-helpers/init
  - build-perl
  - perl -V
  - build-dist
  - cd $BUILD_DIR             # $BUILD_DIR is set by the build-dist command
install:
  - cpan-install --deps       # installs prereqs, including recommends
  - cpan-install --coverage   # installs coverage prereqs, if enabled
before_script:
  - coverage-setup
script:
  - perl Makefile.PL          # or Build.PL if it exists
  - make                      # or ./Build
  - prove -l -s -j$(test-jobs) $(test-files)
after_success:
  - coverage-report

While the automatic mode supports most of the features the helpers provide, it isn't meant to be used with custom build steps. If any customization of the build steps is needed, the automatic mode shouldn't be used.

Perl Building - build-perl

The first important helper function is build-perl. It takes the requested perl version from the build matrix and either downloads or builds it for you if it doesn't exist. So for example, if 5.16 is requested, Travis will already have it available and nothing will be done. But if 5.16.0 is requested, a fresh version of perl will be built. If 5.8.8 is requested, a pre-built copy of perl 5.8.8 will be downloaded, as it's a commonly tested version so I've pre-built it. Building perl generally takes around 4 minutes on Travis, so these pre-built copies can significantly speed up small test suites.

Build flags can also be added to the versions. 5.8.5-thr will build a version of perl including support for threads. 5.8.5-dbg will include debugging support. And 5.16-thr will build the latest 5.16 release and include support for threads.

If blead is requested, perl will be built from git. This is helpful to see if your module will be impacted by future changes to perl, but as blead is not guaranteed stable it should usually be included in Travis's allow_failures section.

Pre-installed Modules - local-lib

When the helper scripts build or download a perl version, they don't have any extra modules pre-installed. The default Travis builds all include a set of prerequisites pre-installed. Both cases can be useful for different situations. In some cases, you want to that your prerequisite installation works properly, or that your module works with an older version of a core module. But installing all of the prerequisites every time can delay testing by a significant amount.

To help with this, each pre-built copy of perl also has a set of pre-built local::lib directories that can be switched to. These can be used by adding them directly to the build matrix, attaching them to the perl version like 5.10.1@moose. The moose pre-built includes Moose and Moo. If not using a pre-built perl, the modules in the named local::lib will be installed.

The full list of pre-built local::libs and the libraries in them can be seen in the local-libs.txt file.

Distribution Building - build-dist

There are a variety of tools used for distribution building. Manually writing a Makefile.PL is one, but other options include Module::Build, Module::Install, or Dist::Zilla. While tests can often be performed directly against the files in the repository without building, this won't include any of the extra checks done by or generated by the dist building tool. It also can complicate the process of finding prerequisites.

The approach the helpers recommend is first generating a full dist like would be uploaded to CPAN, then testing against that. Because the distribution building tool often won't work on all of the perl versions you wish to test against, it's helpful to use a different (newer) version of perl than the tests are run with.

This is what the build-dist helper does. It uses the latest pre-built version of perl to generate a distribution directory, automatically installing any modules needed. It then sets the BUILD_DIR environment variable to the location of the built distribution.

Prerequisite Installation - cpan-install

For most cases, prerequisite installation could be handled by cpanm, but the cpan-install helper provides a few niceties. It provides more helpful output than cpanm in the event of a failure, but is still concise in the common case. It also tweaks the set of modules to be installed. The developer prerequisites and recommended modules of the distribution being tested will be installed, but not those of its prerequisites.

It also includes better compatibility with ancient versions of perl.

Coverage Reporting

Setting up coverage reporting in Travis is relatively simple. You just need to install the Devel::Cover module and run the cover command appropriately. But coverage reporting slows down testing substantially and can also prevent some tests from running (such as those using threads). So it's useful to limit coverage testing to only some of the perls you are testing with. With that in mind, the helper scripts include several coverage related commands that are no-ops unless the COVERAGE environment variable is set.

Running the Tests

For running the actual tests, the helpers do very little. It's recommended to use the standard prove command, with whatever options are wanted.

There are a few helpers that can be used with prove though. If you want to run tests in parallel, the test-jobs returns a recommended number of processes to use. The number is one more than the number CPUs available. It also will always return 1 if COVERAGE is enabled, since Devel::Cover is currently buggy when used with parallel testing.

The test-files returns all of the test scripts to run. This is generated by searching for .t files recursively in the t and xt directories. However, if the AUTHOR_TESTING environment variable is set to 0, it will only return files in t. It can also help with very slow test runs. If the TEST_PARTITION and `TEST_PARTITIONS environment variables are set, it will return only a subset of the tests. This allows you to split the tests across multiple Travis builds in parallel, making the full test run take less time.

Bits and Pieces

An important feature of the helpers is that they can all be used independently of each other. So if perl building is the only feature needed, the rest of the helpers can be ignored.

Overall, having these helpers has allowed me to set up testing easier for a variety of different projects, and allowed me to expand the versions of perl tests. They have been used to add perl 5.8 and blead testing to Moose, and perl 5.6 testing to Moo and ExtUtils::MakeMaker.

Devel::Confess - Include stack traces with all errors/warnings

Edit: Since writing this, I've decided on a proper name. Devel::Confess is the name for this module going forward. Carp::Always::EvenObjects exists now only as a wrapper around Devel::Confess.

Carp::Always is a very useful module that will force all errors and warnings to include a full stack trace to help with debugging. However, it has some limitations. If an exception object is thrown rather than a string, the stack trace can't simply be appended to it. die, Carp, and Carp::Always just pass the object through unmodified. Some exception systems include stack traces in their objects, but for those that don't, this hurts the ability to debug. As more libraries use exception objects, this becomes more problematic (e.g. autodie).

With that in mind, I've written Carp::Always::EvenObjects. It works similarly to Carp::Always, but will also attach stack traces to objects. This is done by re-blessing the object into a subclass that knows how to include the stack trace when the object is stringified. Most exception systems the same as the originals.

It will also attach stack traces to plain non-object refs, although their use as exceptions is rather rare.

Normally you would use the module on the command line, as:

perl -MCarp::Always::EvenObjects script.pl

As a bonus, since the name is rather long, the dist includes the module Devel::Confess as an alias, allowing you to use the shorter:

perl -MDevel::Confess script.pl

or even

perl -d:Confess script.pl

Using system or exec safely on Windows

Passing a list of arguments to another program on Windows in perl is much more complicated than it should be. There are several different issues that combine that lead to this.

(mostly copied from a post I made on PerlMonks)

First is that argument lists are always passed as a single string in Windows, as opposed to arrays like on other systems. This is less of a problem than it appears, because 95% of programs use the same rules for parsing that string into an array. Roughly speaking, the rules are that arguments can be quoted with double quotes, and backslashes can escape any character.

The second issue is that cmd.exe uses different quoting rules than the normal parsing routine. It uses a caret as the escape character instead of backslash.

The result of this is that you can't create a string that will be treated the same for both of these cases. This becomes a larger problem, because perl switches between using cmd.exe vs calling directly based on if they have shell meta-characters in them. And that involves a third, different set of quoting rules. There isn't any good way to check which way perl is going to treat a command without reimplementing the code to detect them that exists inside perl. So here is a routine that will quote arguments correctly to use with system on Windows:

sub quote_list {
    my (@args) = @_;

    my $args = join ' ', map { quote_literal($_) } @args;

    if (_has_shell_metachars($args)) {
        # cmd.exe treats quotes differently from standard
        # argument parsing. just escape everything using ^.
        $args =~ s/([()%!^"<>&|])/^$1/g;
    }
    return $args;
}

sub quote_literal {
    my ($text) = @_;

    # basic argument quoting.  uses backslashes and quotes to escape
    # everything.
    if ($text ne '' && $text !~ /[ \t\n\v"]/) {
        # no quoting needed
    }
    else {
        my @text = split '', $text;
        $text = q{"};
        for (my $i = 0; ; $i++) {
            my $bs_count = 0;
            while ( $i < @text && $text[$i] eq "\\" ) {
                $i++;
                $bs_count++;
            }
            if ($i > $#text) {
                $text .= "\\" x ($bs_count * 2);
                last;
            }
            elsif ($text[$i] eq q{"}) {
                $text .= "\\" x ($bs_count * 2 + 1);
            }
            else {
                $text .= "\\" x $bs_count;
            }
            $text .= $text[$i];
        }
        $text .= q{"};
    }

    return $text;
}

# direct port of code from win32.c
sub _has_shell_metachars {
    my $string = shift;
    my $inquote = 0;
    my $quote = '';

    my @string = split '', $string;
    for my $char (@string) {
        if ($char eq q{%}) {
            return 1;
        }
        elsif ($char eq q{'} || $char eq q{"}) {
            if ($inquote) {
                if ($char eq $quote) {
                    $inquote = 0;
                    $quote = '';
                }
            }
            else {
                $quote = $char;
                $inquote++;
            }
        }
        elsif ($char eq q{<} || $char eq q{>} || $char eq q{|}) {
            if ( ! $inquote) {
                return 1;
            }
        }
    }
    return;
}

The information about the quoting rules on Windows is from the article Everyone quotes command line arguments the wrong way. I attempted to use this to improve ExtUtils::MakeMaker's quoting, but that also has to deal with Makefile quoting rules. Additionally, other parts of the code (or at least tests) assume that you can generate a string and have it work both when passed to system and when placed in a Makefile. I almost never use perl on Windows, so I eventually gave up on the effort.

Converting Complex SVN Repositories to Git - Part 4

Cleaning up and simplifying merges

After the previous steps, the git repository has an accurate history of what was done to the SVN repository. It is a direct translation though, and shows more the process and tools that were used, rather than developer intent. I proceeded to simplify how the merges were recorded to eliminate the convoluted mess that existed and make the history usable.

Two main classes of these problems existed. There were branches were merged one commit at a time, as that was one way of preserving the history in SVN. The other case was trunk being merged into a branch, and immediately merging that back into trunk. Some other issues match up with those two merge styles and the same cleanup will apply to them.

Here is a section of the history of the 'DBIx-Class-resultset' branch being merged, one commit at a time. Obviously not ideal, but you can mostly tell what is happening.

resultset-ugly.png

The merge of the 'DBIx-Class-current' branch was somewhat less straightforward. current-ugly-end.png

...

current-ugly-middle.png

...

current-ugly-start.png

This smaller example of the 'resultset_cleanup' branch helps show how these can be dealt with.

resultset_cleanup-before.png

If we search for merges, starting from the earliest point in the repository history, we will find the commit noted as 4. We don't want to remove the record of this branch being merged, so initially we will leave it alone. The next merge we find however, 1, makes the first redundant. There is no need to maintain the first merge now that we know that this one exists. This process continues forward, eventually resulting in a single merge commit for the branch.

The code for this is in 43.graft-merges-simplified.

# get a list of all of the merge commits and their parent commits, space separated
my @merges = `git log --all --merges --pretty=format:'%H %P'`;
# to record all of the commits we intend to alter
my %altered;
# to record all of the merges we've seen so far
my %merges;
# start at the earliest point
for my $merge ( reverse @merges ) {
    chomp $merge;
    my ($commit, @parents) = split / /, $merge;
    $merges{$commit} = \@parents;
    # checking our merge [1]
    # this repo only contains merges with two parents
    my ( $left_parent, $right_parent ) = @parents;
    # check if our first parent [3] is a merge
    if ( my $left_grandparents = $merges{ $left_parent } ) {
        # find the grandparent [4] on the opposite side of the merge [2]
        my $right_grandparent
            = `git show -s --pretty='format:%P' $right_parent | cut -d' ' -f1`;
        chomp $right_grandparent;
        # if it is the same as the grandparent ([4] again) on the left side
        if ($right_grandparent eq $left_grandparents->[1]) {
            # we know we want to simplify this merge
            $altered{$commit}++;
            # switch the left parent (was [2]) to the left grandparent [5]
            $parents[0] = $left_grandparents->[0];
            # our left parent shouldn't be part of the history anymore,
            #   so we don't want to match it
            delete $merges{ $left_parent };
            # nor do we need to change it
            delete $altered{ $left_parent };
        }
    }
}

# many of these merges exist only because they were calculated in previous steps
# we don't want duplicate grafts, so we simple comment out the old ones.
my $regex = '(?:' . (join '|', keys %altered) . ')';
system "perl -i -pe's/^($regex )/# \$1/' $GIT_DIR/info/grafts";

# record the grafts
open my $fh, '>>', "$GIT_DIR/info/grafts";
print { $fh } "# Simplified merges\n";
for my $commit ( keys %altered ) {
    print { $fh } join(q{ }, $commit, @{ $merges{$commit} }) . "\n";
}
close $fh;

# we're modifying these merge commits.  whatever their commit
# messages were initially won't be accurate anymore.
# later, when we rewrite the commit messages, we want to just
# record these as branch merges.
# this just keeps track of which commits we want to simplify the
# commit messages in this manner.

use Data::Dumper;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Sortkeys = 1;

@altered{ keys %$simplified_merges } = values %$simplified_merges;
open $fh, '>', "$BASE_DIR/cache/simplified-merges.pl";
print { $fh } Dumper(\%altered);
close $fh;

The end result is obviously much nicer.

resultset_cleanup-after.png

It turned out that while these calculations caught the majority of the cases, a couple complex, ugly cases were missed. The 'DBIx-Class-current' case was one of these. Rather than spend the extra effort to find an additional strategy to automatically detect such cases (if it was even possible), I manually figured out the best way to record the merges and put them in the 42.graft-merges-simplified-manual file.

Here we see a merge into a branch, followed immediately by a merge into trunk.

rsrc_in_storage-before.png

Another case that makes the history harder to follow. And while this example is relatively straightforward, cleaning up this type of merge helps in much uglier cases as well. The process for simplifying these merges may eliminate the commits our branches are referring to, but we don't have any need to maintain the branches that have been merged, so we delete them here (46.delete-merged-branches, the same script as 60.delete-merged-branches).

The 47.graft-merges-redundant script simplifies these. It follows a similar structure to the previous simplification script.

my @merges = `git log --all --merges --pretty=format:'%H %P'`;
my %altered;
my %merges;
for my $merge ( reverse @merges ) {
    chomp $merge;
    my ($commit, @parents) = split / /, $merge;
    my $f;
    # for each merge [1]
    $merges{$commit} = \@parents;
    # check each parent [2] in turn ([3] will be checked first, but fail
    #   a later test)
    PARENT: for my $p ( 0 .. 1 ) {
        my $parent = $parents[ $p ];
        # check against the other parent [3]
        my $check_ancest = $parents[ 1 - $p ];
        # we only care if it is merge
        my $ancest = $merges{ $check_ancest } || next;

        ANCEST: for my $c ( 0 .. 1 ) {
            # if the first parent [3] is also a parent of the second parent [2]
            if ($parent eq $ancest->[ $c ]) {
                $altered{$commit}++;
                # we don't need the current second parent [2], so switch
                # it to that commit's other parent [4]
                $parents[1 - $p] = $ancest->[1 - $c];
                # don't match or change the commit we are clipping out
                delete $merges{ $check_ancest };
                delete $altered{ $check_ancest };
                # and skip to the next commit
                last PARENT;
            }
        }
    }
}

The redundant merge is now gone.

rsrc_in_storage-after.png

The history simplification is now basically complete. Instead of the convoluted mess that resulted from a direct translation of the SVN repository, it now has a mostly understandable history showing what the developers intended, rather that the exact method they used to do so. All that is left to do is clean up the commit messages and attribution, fix the tags, and a few other minor cleanups.

Next: Commit message and other final cleanups, and baking in grafts

Converting Complex SVN Repositories to Git - Part 3

Resolving Branches and Calculating Merges

The most important part of the repository conversion I did was resolving all of the branches and calculating the merge points. The majority of the rest of the process is easily automated with other tools.

The main part of this section was determining what had happened to all of the branches. One of the important differences between Git and SVN is that if a branch is deleted in Git, any commits that only existed in that branch are permanently lost. With SVN, the deleted branches still exist in the repository history. git-svn can't delete branches when importing them, because that would be losing information. So all of the branches that existed throughout the history of the repository will exist in a git-svn import and must be dealt with.

There are four possibilities for what happens to branches. The simplest are the branches that currently exist. These we obviously want to maintain as branches. Some branches are merged then deleted. Once the merge is recorded, we can delete these branches in Git. Others don't have any real commits in them, consisting just of commits creating the branches and then being updated to the current trunk. These can just be deleted. The last are branches that existed and had real changes committed to them, but were then thrown away for various reasons. These can't be deleted without losing information, so I just filed them into a sub-directory 'trash'. Without knowing the full history of the project I couldn't know how valuable these branches were.

At the end of this process, the only branches that should have existed were the current branches, and anything marked as trash. So I created the unresolved-branches script, noting all of the current branches in it. It simply reports the branches that I hadn't found a resolution for.

Next, I used another part of nothingmuch's git-svn-abandon to delete all branches that had been merged into others 60.delete-merged-branches:

# remove merged branches
git for-each-ref --format='%(refname)' refs/heads | while read branch; do
    git rev-parse --quiet --verify "$branch" > /dev/null || continue # make sure it still exists
    git symbolic-ref HEAD "$branch"
    git branch -d $( git branch --merged | grep -v '^\*' | grep -v 'master' )
done

git checkout master

This checks out each branch in turn, finds all of the branches that have been merged into it, and deletes them.

This will only be effective after all of the proper merges have been recorded though. git-svn will record some of the merges during the import process. It uses the SVN and SVK merge information to do this, but sometimes this information isn't recorded, so I had to find the information myself. The first process I used to do this was by matching commit messages. The format of the SVK commit messages was specific enough I was able to extract information from them and match that to other commits 40.graft-merges-rev-matching. As example commit message:

 r13301@evoc8 (orig r2696):  dyfrgi | 2006-08-21 10:33:04 -0500
 Change _cond_for_update_delete to handle more complicated queries through recursing on internal hashes.
 Add a test which should succeed and fails without this change.
 r13302@evoc8 (orig r2697):  blblack | 2006-08-21 12:33:02 -0500
 bugfix to Oracle columns_info_for
 r13321@evoc8 (orig r2716):  dwc | 2006-08-22 00:05:58 -0500
 use ref instead of eval to check limit syntax (to avoid issues with Devel::StackTrace)

This basically was merging three commits from one branch into another. The piece of information I needed from a message like this was the latest revision number that had been merged in, in this case 2716. There were also cases where commit messages like this were copied into other SVK commit messages, so the relevant information would be idented. That could only be done if there weren't any unindented 'orig' notations. That resulted in the first section of the script:

my @merges = `git log --all --no-merges --format='%H' --grep='(orig r'`;
chomp @merges;

open my $fh, '>>', "$GIT_DIR/info/grafts";
print { $fh } "# Revision matching\n";
for my $commit (@merges) {
    my $commit_data = `git cat-file commit $commit`;
    my @matched = $commit_data =~ /^[ ]r\d+\@[^\n]+\(orig[ ]r(\d+)\)/msxg;
    my ($parent_rev) = sort { $b <=> $a } @matched;
    unless ($parent_rev) {
        @matched = $commit_data =~ /^[ ][ ]r\d+\@[^\n]+\(orig[ ]r(\d+)\)/msxg;
        ($parent_rev) = sort { $b <=> $a } @matched;
        unless ($parent_rev) {
            @matched = $commit_data =~ /^[ ][ ][ ]r\d+\@[^\n]+\(orig[ ]r(\d+)\)/msxg;
            ($parent_rev) = sort { $b <=> $a } @matched;
            unless ($parent_rev) {
                warn "odd commit $commit.  merge but wrong format\n";
                next;
            }
        }
    }

Ugly and copied and pasted obviously, but no real work has been put in to generalize it. Once that revision was found, I needed to find the commit that corresponded to it. In the simple case, this is a single statement:

my $parent_commit = `git log --all --format='%H' -E --grep='git-svn-id: .*\@$parent_rev '`;
chomp $parent_commit;

I also had code to attempt to resolve this case for if the parent revision touched multiple branches, but this wasn't needed in the end. It only had an impact when my initial import was incomplete.

With the parent commit found, the merge commit could be added to the grafts file, recording both its current parent and adding the new one.

This left a number of branches to be manually figured out. The first valuable piece of information was to find how it was deleted from SVN. That information wasn't actually maintained by the import, so I wrote a script (find-branch-deletion) to find the revision each branch was deleted by doing a binary search between the last revision in the branch and the latest revision.

For branches that I found that had no valuable information, I simply deleted them (50.delete-empty-branches). For branches that weren't merged but were deleted, I renamed them prefixed with 'trash/' (55.archive-deleted-branches). For branches that were merged, I needed to find the merge point. This usually consisted of finding some changes unique to the branch, then doing a search using Git's pickaxe search to find where else it existed. Once I figured out how it had been merged, I recorded this in the 41.graft-merges-manual file. Since the git commit hashes could easily change depending on the import process, I couldn't use them directly, so instead used various pieces of the commit messages that I knew were unique. For example:

git --no-pager log --format="%H %P $(git rev-parse doc_mods)" --grep='DBIx-Class/0.08/trunk@5014'

This records the commit hash and parent commit hashes corresponding to revision 5014 in trunk, adding the commit hash for the doc_mods branch as a second parent.

With this work done, the resolution of every branch had been determined and all of the merges were recorded. But many of the merges had extraneous commits information and made the history hard to work with, so I went about cleaning up them up, giving a better representation of the intentions of the merges instead of showing the particulars of the tools used.

Next: Cleaning up the merges