Refactoring When Tests Are Failing

By Ovid on December 26, 2012 9:57 AM

Today I'm working on some code. That's not surprising. It's also not surprising that the person I'm working on the code with is on holiday until the new year starts. Before he left, he made some significant changes to optimize our code, but for one subsystem he didn't have the time to apply his changes. This means that we have 7 tests failing out of 397. In a quick check we found there were plenty of conflicts between his code and mine and my work was generating even more conflicts. This meant we had two choices for how to deal with this while he was gone:

I could continue my work with passing tests and deal with even more conflicts when he returns
I could merge his code, deal with some failing tests. get rid of the conflicts and have a much faster code base

We opted for the latter, but when running the tests, how do I know if I broke something? This is one reason that we argue that all tests must pass for a test suite: do you scan through all of the broken tests to see if they've somehow broken differently? If you do, you'll start to ignore failures because that's normal. Eventually you'll get serious failures that showed up in testing but that you didn't notice.

So what follows isn't perfect, but if you have to temporarily deal with failing tests, it's far better than just trying to remember which tests are "allowed" to fail.

There's a feature in Test::Harness that many people don't know about: --state. With this, you can do the following:

prove -lr --state=save

And that will save the state of this test run in a file named .prove. In that, you'll see YAML output which has individual test file results like this:

  "./t/Advertiser/events.t":
    elapsed: 1.73660922050476
    gen: 3
    last_fail_time: 1356516251.85145
    last_result: 2
    last_run_time: 1356516251.85145
    last_todo: 0
    seq: 38
    total_failures: 1

In this case, we see from last_result that we had two test failures in this test file. In fact, we have this number for all test files. I used this knowledge to whip up a quick provediff program:

#!/usr/bin/env perl                                                                                                                                 

use Modern::Perl;
use App::Prove::State;

my ( $prove1, $prove2 ) = @ARGV;
unless ( -f $prove1 and -f $prove2 ) {
    die "Pass .prove files, please";
}

# XXX Must fix this in App::Prove. We want to be able to read and parse these
# .prove files even if we're in a different directory.
no warnings 'redefine';
local *App::Prove::State::_prune_and_stamp = sub { };
my $first = App::Prove::State->new( { store => $prove1 } )->results;
my %first_tests = map { $_ => $first->test($_) } $first->test_names;

my $second = App::Prove::State->new( { store => $prove2 } )->results;
my %second_tests = map { $_ => $second->test($_) } $second->test_names;

foreach my $test ( sort keys %first_tests ) {
    unless ( exists $second_tests{$test} ) {
        header($prove1, $prove2);
        say "Test '$test' found in '$prove1' but not '$prove2'";
    }
}
foreach my $test ( sort keys %second_tests ) {
    unless ( exists $first_tests{$test} ) {
        header($prove1, $prove2);
        say "Test '$test' found in '$prove2' but not '$prove1'";
        next;
    }

    my $first_result  = $first_tests{$test}->result;
    my $second_result = $second_tests{$test}->result;

    if ( $first_result != $second_result ) {
        header($prove1, $prove2);
        say
          "Test '$test' results are different: $first_result failure(s) versus $second_result failure(s)";
    }
}

sub header {
    my ( $prove1, $prove2 ) = @_;
    state $header_printed = 0;
    return if $header_printed;
    say "provediff results for '$prove1' versus '$prove2'";
    $header_printed = 1;
}

It's not particularly brilliant, but first I run prove -lr --state=save and then cp .prove prove_orig. Then subsequent test runs while refactoring have me running tests like this:

prove -lr --state=save; provediff prove_orig .prove

If we don't see a different number of failures, prove_diff gives no output. If we do see different failures, we get results like this:

provediff results for 'prove_orig' versus '.prove'
Test 't/Advertiser/conversions.t' results are different: 2 failure(s) versus 3 failure(s)
Test 't/Advertiser/conversions_functional.t' results are different: 4 failure(s) versus 3 failure(s)

This is a far cry from perfect, but it's also far better than visually inspecting the prove results every time and seeing if the failures are different. This is also only a stopgap measure until the other dev gets back and we can safely fix the last of the test failures.

Happy holidays! (I should write more about --state. It has a lot of neat features).

6 comments

Tagged as:

refactoring, tests

6 Comments

Olivier Mengué (dolmen) | December 27, 2012 12:39 AM | Reply

Another solution I commonly use is TODO blocks (see Test::More for more info about TODO blocks).

Before merging your own code, change the code in the common repository to add TODO blocks around the failing tests:

+{
+    local $TODO = "failing test from colleague in holidays";
     fail("Failing test");
+}

After this change, check that the testsuite pass. Check that almost no TODO success (none is ideal). Commit all those patches to the testsuite in a single commit (that will be easier to revert it later).

Then merge your own code. The test that fail are your own failures. Fix them.

Your colleague will have to fix the TODOs when (s)he comes back. You can even revert the TODO commit to push him to do his job.

Ovid replied to comment from Olivier Mengué (dolmen) | December 27, 2012 7:11 AM | Reply

Agreed that this is usually the best strategy. I avoided it here because I don't want those TODO's to be "forgotten" as they so often are since those are the first things he's tackling when he gets back. Still, maybe having that as a simple commit and reverting might be a better solution if I could figure out a safe way of automating that.

Joel Berger | December 28, 2012 9:51 PM | Reply

Of course, TODO blocks issue an "unexpected pass" message when they are fixed, so you should be able to notice that the changes have been made and that the TODOs may be removed.

Aristotle replied to comment from Olivier Mengué (dolmen) | December 31, 2012 4:14 AM | Reply

The drawback of the TODO strategy in comparison to the prove --state approach is that if any of the tests marked TODO should start failing differently than they failed when you marked them as TODO, you will not notice it.

Alex Balhatchet | January 22, 2013 10:58 AM | Reply

"I should write more about --state. It has a lot of neat features"

I presented at a London Perl Mongers technical meeting about TAP::Harness and some of the cooler things you can do with prove late-ish last year. Slides are here: http://www.slideshare.net/kaokun/continuous-testing-and-deployment-in-perl-londonpm-technical-meeting

Would love to hear/read if I missed any good tricks!

Ovid replied to comment from Alex Balhatchet | January 22, 2013 1:55 PM | Reply

Alex, it looks like you gave a great summary.

I find it interesting that you mention doing more unit tests instead of integration tests. Yes, that's probably faster, but I find the latter tends to expose more bugs for me.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid