Refactoring When Tests Are Failing
Today I'm working on some code. That's not surprising. It's also not surprising that the person I'm working on the code with is on holiday until the new year starts. Before he left, he made some significant changes to optimize our code, but for one subsystem he didn't have the time to apply his changes. This means that we have 7 tests failing out of 397. In a quick check we found there were plenty of conflicts between his code and mine and my work was generating even more conflicts. This meant we had two choices for how to deal with this while he was gone:
- I could continue my work with passing tests and deal with even more conflicts when he returns
- I could merge his code, deal with some failing tests. get rid of the conflicts and have a much faster code base
We opted for the latter, but when running the tests, how do I know if I broke something? This is one reason that we argue that all tests must pass for a test suite: do you scan through all of the broken tests to see if they've somehow broken differently? If you do, you'll start to ignore failures because that's normal. Eventually you'll get serious failures that showed up in testing but that you didn't notice.
So what follows isn't perfect, but if you have to temporarily deal with failing tests, it's far better than just trying to remember which tests are "allowed" to fail.
There's a feature in Test::Harness that many people don't know about: --state
. With this, you can do the following:
prove -lr --state=save
And that will save the state of this test run in a file named .prove
. In that, you'll see YAML output which has individual test file results like this:
"./t/Advertiser/events.t":
elapsed: 1.73660922050476
gen: 3
last_fail_time: 1356516251.85145
last_result: 2
last_run_time: 1356516251.85145
last_todo: 0
seq: 38
total_failures: 1
In this case, we see from last_result
that we had two test failures in this test file. In fact, we have this number for all test files. I used this knowledge to whip up a quick provediff
program:
#!/usr/bin/env perl
use Modern::Perl;
use App::Prove::State;
my ( $prove1, $prove2 ) = @ARGV;
unless ( -f $prove1 and -f $prove2 ) {
die "Pass .prove files, please";
}
# XXX Must fix this in App::Prove. We want to be able to read and parse these
# .prove files even if we're in a different directory.
no warnings 'redefine';
local *App::Prove::State::_prune_and_stamp = sub { };
my $first = App::Prove::State->new( { store => $prove1 } )->results;
my %first_tests = map { $_ => $first->test($_) } $first->test_names;
my $second = App::Prove::State->new( { store => $prove2 } )->results;
my %second_tests = map { $_ => $second->test($_) } $second->test_names;
foreach my $test ( sort keys %first_tests ) {
unless ( exists $second_tests{$test} ) {
header($prove1, $prove2);
say "Test '$test' found in '$prove1' but not '$prove2'";
}
}
foreach my $test ( sort keys %second_tests ) {
unless ( exists $first_tests{$test} ) {
header($prove1, $prove2);
say "Test '$test' found in '$prove2' but not '$prove1'";
next;
}
my $first_result = $first_tests{$test}->result;
my $second_result = $second_tests{$test}->result;
if ( $first_result != $second_result ) {
header($prove1, $prove2);
say
"Test '$test' results are different: $first_result failure(s) versus $second_result failure(s)";
}
}
sub header {
my ( $prove1, $prove2 ) = @_;
state $header_printed = 0;
return if $header_printed;
say "provediff results for '$prove1' versus '$prove2'";
$header_printed = 1;
}
It's not particularly brilliant, but first I run prove -lr --state=save
and then cp .prove prove_orig
. Then subsequent test runs while refactoring have me running tests like this:
prove -lr --state=save; provediff prove_orig .prove
If we don't see a different number of failures, prove_diff
gives no output. If we do see different failures, we get results like this:
provediff results for 'prove_orig' versus '.prove'
Test 't/Advertiser/conversions.t' results are different: 2 failure(s) versus 3 failure(s)
Test 't/Advertiser/conversions_functional.t' results are different: 4 failure(s) versus 3 failure(s)
This is a far cry from perfect, but it's also far better than visually inspecting the prove
results every time and seeing if the failures are different. This is also only a stopgap measure until the other dev gets back and we can safely fix the last of the test failures.
Happy holidays! (I should write more about --state. It has a lot of neat features).
Another solution I commonly use is TODO blocks (see Test::More for more info about TODO blocks).
Before merging your own code, change the code in the common repository to add TODO blocks around the failing tests:
After this change, check that the testsuite pass. Check that almost no TODO success (none is ideal). Commit all those patches to the testsuite in a single commit (that will be easier to revert it later).
Then merge your own code. The test that fail are your own failures. Fix them.
Your colleague will have to fix the TODOs when (s)he comes back. You can even revert the TODO commit to push him to do his job.
Agreed that this is usually the best strategy. I avoided it here because I don't want those TODO's to be "forgotten" as they so often are since those are the first things he's tackling when he gets back. Still, maybe having that as a simple commit and reverting might be a better solution if I could figure out a safe way of automating that.
Of course, TODO blocks issue an "unexpected pass" message when they are fixed, so you should be able to notice that the changes have been made and that the TODOs may be removed.
The drawback of the TODO strategy in comparison to the
prove --state
approach is that if any of the tests marked TODO should start failing differently than they failed when you marked them as TODO, you will not notice it."I should write more about --state. It has a lot of neat features"
I presented at a London Perl Mongers technical meeting about TAP::Harness and some of the cooler things you can do with prove late-ish last year. Slides are here: http://www.slideshare.net/kaokun/continuous-testing-and-deployment-in-perl-londonpm-technical-meeting
Would love to hear/read if I missed any good tricks!
Alex, it looks like you gave a great summary.
I find it interesting that you mention doing more unit tests instead of integration tests. Yes, that's probably faster, but I find the latter tends to expose more bugs for me.