I want Perl Testing Best Practices
[I actually wrote this a long time ago and it's been stuck in the draft status. I don't have answers for these yet.]
I've been swamped with work lately, and despite perl5-porters giving me and everyone else plenty of time to update all of our modules for the next major release, I basically ignored Perl 5.11. Life sucks sometimes, then they release anyway. This isn't really a big deal because all the CPAN Testers FAILs go to a folder that I look at all at once. It's a big deal for other people when they try to install a borken dependency and cpan(1) blows up.
However, my negligence in updating my CPAN modules reminded me of a possible best practice that has been on my mind for a long time, and which I've casually brought up at a couple Perl QA workshops since I've written several Test modules. Don't rush to say Test::Class just yet.
In a nutshell, Perl's testing framework is a stew of real tests and implied tests, and we can't tell the difference. Some of those tests use Test::Builder-based functions that generate test output:
ok( $some_value, 'Some value is true' );
like( $var. $regex. 'Hey, it matches!' );
Some things that we don't normally think of "tests" actually are:
use Test::File;
By using my Test::File module, you are asserting that it passes all of its tests too. If you don't have it installed, cpan(1) will, by default, try to fetch that distribution and run its tests (cpanminus decidedly won't).
The problem with a Test module is that its tests working or failing have nothing to do with the code that you are trying to test, but a failure to load my module, which is probably completely my fault, but in the mess of output from a test failure, I usually don't get the blame.
So far, we don't have a strong practice for capturing problems in tests. In fact, despite the Perl community's otherwise good practices and coding standards, we don't pay attention to test script quality. In particular, we let our test scripts fail for all sorts of reasons that have nothing to do with the target code. Maybe I need to do something like this instead:
eval {
use Test::File;
} or skip_all( ... );
Okay, that's one thing that bothers me about my tests. I also frequently micro-manage tests. Let's say that I want to test a method that needs to open a file. I'll have several checks in the setup because I want to ensure that I'm doing the right thing before I get to testing my method. That is, I want to test my test setup:
ok( -e $filename, "$filename is there" );
is( md5( $filename ), $expected_md5, "$filename looks like it has the right stuff" );
is( -s $filename, $expected_size, "$filename has the right size" );
Test::Class (and maybe some other frameworks) have setup and tear down methods that mitigate this, but that's not really my problem. If one of these setup methods fail, it's a failure of my test suite but not necessarily my module. I'd like to report that differently.
I've thought that TAP's binary ok / not ok was a bit limited. I'd actually like to have ok / not ok / unable to test / unknown. "Unable to test" is different than "skip". Consider architecture dependent tests that won't run—those are skip tests. If Test::Pod is not installed however, I'd like to see a report that explicitly says "unable to test". It's the undef of the testing world. I'm not actually proposing a change to TAP. Maybe there's some other practice can do the same thing. TAP isn't the point, really, I don't think. As a community, we just don't haven't paid that much attention to what each call to a Test::Builder-y function is really testing and in which column we should put the result. I've been thinking that maybe I should only call a Test::Builder-y thing when I want to report a result that directly relates to the code that I am testing.
Finally, I don't have a habit of documenting my tests. Sure, I put in code comments and the like, but I'm talking about full-on embedded pod that ties together some notional spec with what the particular test file is going to do with it. I feel guilty for about five seconds before I move on to something else.
A lot of people think about the underpinnings of our test system, but we've spent very little time at Perl QA workshop thinking about what a programmer should type out as they write a test file. To solve this, I think the first step is to probably just collect a bunch of stories from people about the practices they use and what nags at them at this level.
The problem with distinguishing between bad module code and bad test code is that you can't REALLY do that reliably much of the time.
If the setup for the testing fails, which is usually just going to do typical stuff in the process, something fairly serious or whacky is probably going on with that environment and module is going to be just as untrustworthy on that platform as if the tests actually failed.
Or perhaps not, but it's very hard to distinguish which is which cleanly.
I'm not really trying to distinguish between bad test and bad module code. I do a lot of testing to minutely check the environment before I test the target code. As you say, there are challenges there, but an 80% solution is better than a 0% solution.
If the setup fails, I want a better way to say that the setup failed. A lot of problems I have with people installing my modules is that something that's a prereq didn't install correctly. That's why cpanm is shiny: it doesn't test so it doesn't have to deal with all that crap that are false failures.
Great post, this is something I've found myself - too many of my distribution versions have been trying to resolve failures that weren't in the module itself, but because of unanticipated environmental factors that caused the test setup to fail.
On the one hand it's a failure of the testing, and shouldn't just be regarded as a SKIP, which would imply nothing was wrong, but on the other hand, it's misleading to state that the module itself failed.
As you say, it'd be very handy to have a third option, and functions that let you say "this assertion is checking that the test environment is configured properly, a failure here is a failure of the test-suite or the environment" as distinct from "this is testing the module functionality, a failure here really is a failure of the module".