Test Hierarchy Produces Poor Unit Tests

The first part of this series described Test Hierarchy, a hierarchy of test classes that mirrors the classes under test, and explained why it’s an antipattern. For how common it is, this practice doesn’t even produce good unit tests.


A unit test, by definition, tests a unit of software, no more, no less. On the one hand, we have unit tests, which test a single module or class. On the other hand, we have integration tests, which test how multiple modules or classes work together. We want each unit test to poke and prod only the class that it tests. We want each subsystem integration test to test a natural subsystem, e.g., the data-export subsystem. We want our system tests to test the whole system. And we don’t want any test to be affected by any other other units, subsystems, or systems.

When our software depends on other software that may change over time, our tests may suddenly start failing because the behavior of the other software has changed. This problem, which is called Context Sensitivity, is a form of Fragile Test

Whatever application, component, class, or method we are testing, we should strive to isolate it as much as possible from all other parts of the software that we choose not to test. This isolation of elements allows us to Test Concerns Separately and allows us to Keep Tests Independent of one another. It also helps us create a Robust Test by reducing the likelihood of Context Sensitivity caused by too much coupling between our SUT [system under test] and the software that surrounds it. (xUnit Test Patterns: Refactoring Test Code. Gerard Meszaros. Addison-Wesley Professional, 2007.)


If terms like Context Sensitivity and Fragile Test feel familiar, it’s not just a coincidence.

Sample-class-hierarchy.png

Bad-Test__Class-hierarchy.png

Test Hierarchy produces tests that purport to be unit tests but that don’t actually test isolated units.

Let’s say we have a Bat class that is a subclass of Mammal. If the tests use Test Hierarchy, then BatTest not only tests the Bat code, but also the superclass Mammal code and its superclass Animal.

This seems to make some intuitive sense, because after all, Bat can do all the things that Mammal and Animal can do, all the methods that it inherits from those classes. But this intuition misses an important distinction. The unit is whatever code is in the Bat.pm module, not whatever the Bat class can do.[1]

When a Bat unit test fails, it should indicate that we made a mistake in Bat.pm, not any other module.

Our bad BatTest doesn’t just test the code in Bat.pm, but also the code in Mammal.pm and Animal.pm. This makes it an integration test (not a unit test), because it doesn’t just test its own module but other modules as well.

And it’s an integration test we don’t need. Generally, we write integration tests that exercise some system feature, like “export Foo data in CSV format.” This might involve setting up the Foo data fixture, invoking the appropriate export feature, then validating the CSV file that it generates. But we don’t need module tests that invoke low-level methods on other modules.

Our-Test__Class-hierarchy.png

In fact, who said that there’s only one test per class?

Here at The Perl Shop, we generally create a separate test module per method or feature. So we’d create move.t, eat.t, and breathe.t, each of which tests a different Animal method. This way, we can group together tests by class method, and easily self-document which tests correspond with which feature.[2]

We also inline our test classes in our .t scripts, which keeps the test code close to the test script and cuts in half the number of files we need to maintain. And makes it impossible to subclass them.

We’ve had great success with these practices, and they’re fundamentally incompatible with Test Hierarchy.

You might have also noticed this in Testing Strategies for Modern Perl. In chapter 2, we create a TicTacToe::BusinessLogic::Game class, which is tested by new.t, board.t, and move.t.

In the concluding post, I’ll reflect on some of the reasons developers use Test Hierarchy, and why these reasons don’t stack up.

Peace, love, and may all your TAP output turn green…


This post originally appeared on The Perl Shop blog as “Test Hierarchy Produces Poor Unit Tests.”


[1] Formally, Bat.pm doesn’t directly define the Bat class. Rather, it defines an implicit “subclass mixin”—all the methods and attributes in Bat.pm that are added to (“composed with“) its superclass Mammal. The object system, then, composes the Bat mixin with Mammal to create the full Bat class. Similarly, Mammal.pm conceptually defines a Mammal mixin that is composed with its superclass Animal in order to create the Mammal class.

[2] Another alternative is to have a separate test class per fixture, which is useful if different test methods have different fixture requirements.

Leave a comment

About Tim King

user-pic I've been working almost exclusively with Perl since 2006, and am one of the founding staff at The Perl Shop. I believe in designing systems that are easy to use, easy to understand, and easy to extend. I love software that does what you want, when you want it, without fighting you every step of the way.