The Problem With Perl Testing

If there is one thing about testing in Perl which bugs me, it's that most testing in Perl is what cgi-lib.pl is to Plack. The following is mostly a rant and I'm also guilty of many of these sins.

Yes, we have a beautiful testing infrastructure around our tests. For example, if I want to know which operating systems and versions of Perl my Code::CutNPaste module fails on, I just go take a look. Or I can quickly see almost 3,000 passing test reports for aliased (woo hoo!). The Perl community is proud of tools like this and we should be. Just reading about Perl's test-infected culture is fantastic and other language which claim to be test-infected often come in a poor second.

Er, sort of.

Where are the tests?

So, open up any random test suite on the CPAN for a suitably large distribution, find some code and then play the "find the tests for this code" game. Go ahead. I'll wait.

In fact, when working for many large companies, one of the biggest obstacles I find to integrating a new developer is teaching them where to find the damned tests! And no, I don't mean "they're in the t/ directory".

When I'm working on some code, I can often reason about where the associated code is. I can't do that with tests. Part of the problem is that we often group tests conceptually and I'm as guilty of this as the next person and I'll cry mea culpa first. For example, I wrote Test::Harness and in that distribution you will find:

$ find lib -type f|wc -l
      49

All things considered, that's not really that big of a codebase, but it's gotten to the point where not everyone can keep the entire codebase in their head. I certainly can't and I can't make a commit without some VMS user pointing out that I broke VMS again. So I should have better organization of that test suite, but I don't. Case in point, I have a t/errors.t test file. What the hell is that? Are you honestly going to tell me that out of ~50 packages, I shove every possible error into one test file, regardless of the source of the error?

If, on the other hand, you can make your tests more or less resemble a one-to-one mapping with the classes/packages they're testing, you can reason about where your tests should go. Or you can use a stopgap measure such as Devel::CoverX::Covered to help you learn your test suite, but that assumes you can run code coverage on your test suite in the first place (those of you who have hour-long test suites know what I'm talking about).

Woo hoo! My dead code is 100% covered!

Yes, you generally want strong code coverage because you want to know that you can change your code with impunity, right? More than once I've found that a test suite's code coverage report is covering dead code. Why on earth would you do that?

Because we tend to not make a distinction between unit and integration tests in the Perl community and Devel::Cover doesn't know or care which is which. If you can make a strong distinction between unit and integration tests, you can have your code coverage run over integration tests (or skip unit tests). Not only will your code coverage be faster, it will be a more accurate representation of what your tests actually cover.

In fact, try to target your code coverage at end points in your code: that is to say, code run from cron, or from the command line, or a browser, or a GUI, and so on. If you find that you can't reach some code from any of its consumers, that's a strong candidate for deletion.

Yeah, whatever. I still have 100% code coverage

So what? I've written about this before. It's trivial to get 100% code coverage and still have buggy code. It's a given that this is impossible to avoid, but that doesn't mean we can't improve on the situation. Basis path testing would help with that, but I know precious few Perl programmers who have even heard about that technique, much less use it. That being said, it would probably be more common if Perl were easier to parse, but borrowing an example from Java:

sub return_input {
    my ( $x, $condition1, $condition2, $condition3 ) = @_;

    if ( $condition1 ) {
        $x++;
    }
    if ( $condition2 ) {
        $x--;
    }
    if ( $condition3 ) {
        $x=$x;
    }
    return $x;
}

For basis path testing, you basically (er, not really) take the number of conditions in the code and add 1 and that tells you how many paths you need to test. Then you pick an arbitrary path and walk through the code, once for each path needed, altering on of the conditions. For the above, you get (T = true, F = false):

  • TTT
  • FTT
  • TFT
  • TTF

Then you write test cases assuming the above.

While you still won't have 100% path coverage, you'll have covered enough of the possible paths through your code that it makes it much easier to discover little known bugs, but sadly, Perl devs are still not pulling in useful information like that from testers in other language communities.

Duplicated Tests

You know that I've tried to go to war against duplicated code in tests. Test::Most, for example, means that instead of this in all of your test programs:

use strict;
use warnings;
use Test::Exception;
use Test::Differences;
use Test::Deep;
use Test::Warn;
use Test::More tests => 42;

You write this:

use Test::Most tests => 42;

What? You don't like the test modules I chose? Use my Test::Kit module and build your own.

But that's not enough. Learning how to not duplicate fixtures, or code behavior is also important. While there's often a t/lib directory for shared test behavior, a common anti-pattern is cutting and pasting test code. Hey, if we change behavior, at least we'll see tests fail, so it's not so bad, right? Yes, it's bad. Too much cutting and pasting in your test code is a sign that you're not taking care of your tests the way you should. When I see this in test suites, I look for, and discover, all manner of problems. Here's one in my own code I've just discovered.

I have a parser that reads a bunch of CSV data, validates it, and passes it to a data importer. Unfortunately, this CSV data was often generated by Microsoft Excel and when you export from Excel to CSV you can easily get corrupted Unicode, credit card and phone numbers getting converted to scientific notation, problematic date formats or just the hassle of stitching multiple worksheets into a single CSV file.

Thus, I wrote a quick parser subclass to read from Excel instead of CSV and all of the above problems miraculously went away.

And then I committed the cardinal sin of copying my CSV parser tests to the Excel parser tests. After a couple of updates to the parsers requiring cutting-n-drooling more test code between them, I started doing the right thing and switch from *.t tests to Test::Class and take advantage of test inheritance. My tests for the Excel parser look like this:

package Test::Company::Utility::Importer::Parser::Excel;
use parent 'Test::Company::Utility::Importer::Parser';

sub _get_file_extension {'xls'}

1;

I've written no tests for the Excel parser because I inherit all of the tests from my regular parser. In my parent code I have something like this:

sub _get_file_extension {'csv'}

sub _get_fixture {
    my ( $self, $fixture ) = @_;

    my $extension = $self->_get_file_extension;
    my $filename  = "t/fixture/parser/$fixture.$extension";
    unless ( -f $filename ) {
        croak("Uknown file: $filename");
    }
    return $filename;
}

And each test gets its fixture with $test->_get_fixture('concatenate_group_columns') and everything magically works.

And then I stumbled on a test which shows that my CSV parser can also read tab-separated data. How does inheritance work in that case? Well, it doesn't. It's clear that I should have a parser base class (or better still, a role) and CSV and Excel should be separate classes inheriting the abstract base class or implementing the parser role. Soon I'll be adding an XML version of this parser and it's now clear that having the CSV parser be the parent is very problematic. De-duplicating my code and "doing the right thing" showed a design flaw that I should have seen earlier and I'm going to have to deal with soon.

Test Metadata

It's long been said that bugs in code tend to recur. While I can't say if this is true, I do find that certain types of bugs tend to return in various code bases I work on. I still remember one terrible tangle of nested evals in one project that usually mysteriously did the right thing, but would sometimes break and we'd cobble together a solution, but we could never quite figure out how to unravel that rat king. We only knew that this was a common failure in that mess of code because of folklore within the team. A strong QA department will often track the types of bugs found and be able to say "well, we tend to get UI failures much more than data failures", but they're a level of abstraction away from the code and won't always know how failures really group together. By tracking this sort of data, it might be nice to say "you know, our CouchDB tests fail three times more often than other tests, so maybe there's something to look at there." But in reality, we don't do this.

I made a stab at solving this once but abandoned it, not convinced that I had taken the right approach. However, I believe the tagging feature in Test::Class::Moose can get us closer to this. Of course, its reporting capabilities already mean we can get useful timing information out of test suites in a way we hadn't previously. Oh, and you could use it to solve the code coverage of unit test problem I mentioned earlier. Oh, and if used correctly, it can not only reduce duplicated code, but also make your tests more discoverable. Hmm ...

Conclusion

Most projects that I work on have Perl tests in a hodge-podge of *.t tests. If I'm lucky enough to work on a Test::Class based test suite, I've found that every single one I've ever worked on is broken in some fundamental way. People are still duplicating code all over the place, we're not paying attention to the difference between different types of tests and we're tossing the tests in a big pile 'o code in a way that starts to become an obstacle after a while. And then we obsess over code coverage in ways that are less than helpful.

For small test suites, big deal. When you're working on mission critical code on a large code base, having a well-designed, easy-to-use test suite can be the difference between delivering next week and delivering next month. I would love to see the Perl community start doing more interesting things with tests, including developing strong tools for capturing metadata about tests that we tend to currently ignore.

Even if other languages are as bad as what I've mentioned above (and many are), that still is no excuse for dropping the ball here. We can do better. We should do better.

10 Comments

Bravo, sir! Bravo.

We both appear to be thinking about the same thing - quality control vs quality assurance. The perl community does quality control pretty well, but most of us have barely heard of quality assurance, and fewer of us have any idea how to do it.

Very accurate and complete test cases coverage... The part on metadata testing reminds me countless sleepless nights :)

The comment about tracking the sources of bugs reminds me of a story from one of Steve McConnell's books. There is a Pareto principle rule in play for bugs, where fairly small parts of your code base are responsible for most of your bugs.

At IBM they did an experiment. They identified what sections of their code were most problematic, and rewrote those pieces from scratch. They rewrote only a little bit of their code, but continuing bug counts went down significantly! (The figures were something like 5% of code rewritten, and a 40% lower bug rate going forward. But I'm sure those are not the right numbers.)

So yes, the hypothetical exercise that you suggest is indeed very valuable.

I've tried Test::Class (and various similar) in all honesty they haven't worked for me (also perl is king of boilerplate ). I haven't noticed any improvement. I'm sure I could design the tests a bit better to improve there usage, but they simply weren't solving the problem with code duplication that I actually needed solved. Even worse they created problems, like having what amounts to a single unparallelizable test.

The best I've gotten to reducing boilerplate is actually using Bread::Board and wiring the classes I need to test. I forget how much code, but I think it was around 1/3 of the test suite was shaved off of Business::CyberSource when I moved it to use Bread::Board.

As far as organization, when I've started building tests for larger things I've started (though now mentioned, I should become consistent with) is using a structure of stripping the namespace and then using a lowercase version of the same class names (e.g. Business::CyberSource::Request::Authorization would become t/request/authorization.t or in the case where I might have to test many things, t/request/authorization/{use-case}.t, or say with a catalyst app your hierarchy would be... t/model t/controller

I've yet to decide whether having my Test injector should be in lib or t/lib

fundamentally inheritance is not the best way to solve problems and so I think that Test::Class and the like will continue to not solve the problem. my test does not have, imo, an isa relationship. Composition with roles might make sense in some cases, but not in the way I've seen it done (where my roles are the custom thing being composed into a test harness class). It'd need to be the other way around IMO, each test becomes the class that can compose commonly needed roles. That would perhaps help but, the most common thing is really a problem of dependencies, which leads me to DI, and Bread::Board. Maybe this week I'll have time to talk about the provider pattern for deep injection, and later I could give some examples for how I started to parallelize db tests at former $job.

I'm as guilty as anyone else, but there are lots of other things we don't do inside test files that I think we (and I) should do.

  • User documentation - why this test exists, and so on and how to run the file on it's own (prove, etc)
  • Clean separation of test code - Test::Class does this sort of thing if you are into that, but subtest is my favorite way.
  • Test segregated into files by small features-in some projects I have a test file per public method.
  • Tests group by level of testing. We dump all tests into t/ even if they test different levels, from unit testing all the way up to acceptance testing. I had the idea with Test::Manifest to mark tests with "levels", as in TNG, "run a level 1 diagnostic". I never really made it so.

I tend to do this for production code that I deliver to a client because I won't be around to explain it all, while on CPAN and GitHub I figure most people grok it or don't care.

All of this is one of the reasons I loathe TDD in an environment with no hard specification. When you spend your time working on the tests, you avoid changing the tests because you think you're "losing" all that work.

Very thoughtful. And guilty as charged.

The thing that puzzles me is how to reorganize. Tests are code. As code, they can have bugs. More than once I have done a small enhancement and had writing and debugging the test take as much time as writing and debugging the code to be tested.

I know my code works because I test it and to the extent that I test it. But how do I test my tests to be sure they are doing what they are supposed to do? The only way I know is to hand-verify, and for a large suite this is a huge amount of work, and fraught with the possibility of human error. It is this problem, not concern for "losing" work, that makes me leery of doing too much to tests once they are written.

For the last 6 months or so, I've almost always included pod in my test files with a brief explanation of what's being tested; though it's usually very brief.

Some of my better test suites are the ones where I've structured the tests along the same lines as the module documentation. So for each section of the documentation, there's a test file with a similar name.

The best way to improve tests in Perl projects is to provide better tests as examples. Best practice comes from practice and many programmers prefer to read and adopt code instead of reading manuals and instructions. Which Perl projects have good tests to learn from?

BTW the quality of tests could also be improved by showing tests more prominently in metacpan. Given a start page of a release, such as Moops one needs to manually browse to directory t/. There should be a more readable test documentation page that automatically lists all tests with their purpose. Modules have a short name and description in the NAME section of their POD. One could extract the short description of a test from the PURPOSE section of each test, right?

I really like this blog post because it gave me a good excuse to finally try out Test::Routine (actually Test::Roo, but no real difference in my mind.) I also reorganized my DBIx::Class::Helpers test suite to be basically t/$foo from lib/$foo. After doing this I have come to the conclusion that a project needs a lot of moving parts to make Test::Routine worth the effort. In my test suites I tend to make a t/lib dir that has code to reuse in tests. They tend to be functions that will prepopulate databases, create mock data, etc. I'm going to try again with DBIx::Class::DeploymentHelper and see if it helps.

Also, for what it's worth, given that many projects are under at least two levels deep in the hierarchy, I'm much happier with my current structure (https://github.com/frioux/DBIx-Class-Helpers/tree/master/t) instead of using extra bogus hierarchy layers in tests (https://github.com/frioux/DBIx-Class-Helpers/tree/routine/t/).

When folks in QA started griping that my t/ directory was littered with integration tests, I mv t i and started over, rebuilding a t/ directory with only unit tests.

I am now almost regularly adding an i/ directory for integration tests, as I find I am not satisfied simply knowing that my methods do what they are intended to do. I also want to know that the code integrates well with its environment.

Doing this meant that prove t/*.t would run in a reasonable time and that I was not taking a coffee break on every commit while my test suite navigated network connections and database interactions which might not exist in the QA environment anyway.

I've been encouraging this as a new convention, and perhaps a/*.cuke for the automated acceptance tests. Does not seem to have caught on yet, though.

-- Hugh Esco

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/