Test Hierarchy Produces Poor Unit Tests
The first part of this series described Test Hierarchy, a hierarchy of test classes that mirrors the classes under test, and explained why it’s an antipattern. For how common it is, this practice doesn’t even produce good unit tests.
A unit test, by definition, tests a unit of software, no more, no less. On the one hand, we have unit tests, which test a single module or class. On the other hand, we have integration tests, which test how multiple modules or classes work together. We want each unit test to poke and prod only the class that it tests. We want each subsystem integration test to test a natural subsystem, e.g., the data-export subsystem. We want our system tests to test the whole system. And we don’t want any test to be affected by any other other units, subsystems, or systems.
When our software depends on other software that may change over time, our tests may suddenly start failing because the behavior of the other software has changed. This problem, which is called Context Sensitivity, is a form of Fragile Test…Whatever application, component, class, or method we are testing, we should strive to isolate it as much as possible from all other parts of the software that we choose not to test. This isolation of elements allows us to Test Concerns Separately and allows us to Keep Tests Independent of one another. It also helps us create a Robust Test by reducing the likelihood of Context Sensitivity caused by too much coupling between our SUT [system under test] and the software that surrounds it. (xUnit Test Patterns: Refactoring Test Code. Gerard Meszaros. Addison-Wesley Professional, 2007.)
If terms like Context Sensitivity and Fragile Test feel familiar, it’s not just a coincidence.
Test Hierarchy produces tests that purport to be unit tests but that don’t actually test isolated units.
Let’s say we have a Bat
class that is a subclass of Mammal
. If the tests use Test Hierarchy, then BatTest
not only tests the Bat
code, but also the superclass Mammal
code and its superclass Animal
.
This seems to make some intuitive sense, because after all, Bat
can do all the things that Mammal
and Animal
can do, all the methods that it inherits from those classes. But this intuition misses an important distinction. The unit is whatever code is in the Bat.pm
module, not whatever the Bat
class can do.[1]
When a Bat
unit test fails, it should indicate that we made a mistake in Bat.pm
, not any other module.
Our bad BatTest
doesn’t just test the code in Bat.pm
, but also the code in Mammal.pm
and Animal.pm
. This makes it an integration test (not a unit test), because it doesn’t just test its own module but other modules as well.
And it’s an integration test we don’t need. Generally, we write integration tests that exercise some system feature, like “export Foo
data in CSV format.” This might involve setting up the Foo
data fixture, invoking the appropriate export feature, then validating the CSV file that it generates. But we don’t need module tests that invoke low-level methods on other modules.
In fact, who said that there’s only one test per class?
Here at The Perl Shop, we generally create a separate test module per method or feature. So we’d create move.t
, eat.t
, and breathe.t
, each of which tests a different Animal
method. This way, we can group together tests by class method, and easily self-document which tests correspond with which feature.[2]
We also inline our test classes in our .t
scripts, which keeps the test code close to the test script and cuts in half the number of files we need to maintain. And makes it impossible to subclass them.
We’ve had great success with these practices, and they’re fundamentally incompatible with Test Hierarchy.
You might have also noticed this in Testing Strategies for Modern Perl. In chapter 2, we create a TicTacToe::BusinessLogic::Game
class, which is tested by new.t
, board.t
, and move.t
.
In the concluding post, I’ll reflect on some of the reasons developers use Test Hierarchy, and why these reasons don’t stack up.
Peace, love, and may all your TAP output turn green…
This post originally appeared on The Perl Shop blog as “Test Hierarchy Produces Poor Unit Tests.”
[1] Formally, Bat.pm
doesn’t directly define the Bat
class. Rather, it defines an implicit “subclass mixin”—all the methods and attributes in Bat.pm
that are added to (“composed with“) its superclass Mammal
. The object system, then, composes the Bat
mixin with Mammal
to create the full Bat
class. Similarly, Mammal.pm
conceptually defines a Mammal
mixin that is composed with its superclass Animal
in order to create the Mammal
class.
[2] Another alternative is to have a separate test class per fixture, which is useful if different test methods have different fixture requirements.
Leave a comment