Reflections on Test2
In a future post I will recount the details of my delightful experience at the 2016 Perl QA Hackathon (N.B. now published here). Since this is my first post since that time I do want to tip my hat to the great sponsors of the event and to my own employer ServerCentral without whom I would not have been able to attend. I will thank them in more detail in that post.
Before I get to that however, I want to post a reflection on one discussion that is and has been weighing on my mind since then. That topic is the upcoming release of Test2, which I consider to be a very important step forward for Perl’s testing architecture.
While I didn’t spend the majority of my time at the QA Hackathon thinking about Test2, I think it probably is the most important part of the event to discuss. Test2 is a major project to rewrite the guts of the venerable Perl testing library Test::Builder. Most users are blissfully unaware of Test::Builder, they use Test::More instead, but Test::More uses Test::Builder as do almost all (let’s say all) test modules.
Test::Builder draws its interface from the Test Anything Protocol (TAP). The protocol is seemingly very simple; print the number of tests you expect to run, then print ‘ok 1’ to say the first test passes and so on. I can report three passing tests by printing
1..3
ok 1
ok 2
ok 3
But most test authors don’t print their test results, they use a function like Test::More::ok
to print the test result.
They use something like plan
or done_testing
to manage the number of tests they expect.
This is because it is annoying to have to keep track of the current test number, especially once you start sharing testing code.
Test::Builder, the Test::More guts, keeps an internal record of the number of tests that have been run, so that you don’t have to. While this seems useful it sows the seeds of problems that we are finally starting to understand. Test::Builder’s internal state is fundamentally important to the running of the test. This very quickly leads to the realization that all the test modules have to play nicely with Test::Builder or the count skews and your tests won’t report correctly and your test runner will see a failure.
Our problem now is that Test::Builder is showing its age. People want to write tests in different and more expressive ways. They want to use subtests (groups of tests that count together), they want to output other formats than just TAP, they want to test their test modules, they want their harness (runner) output to be pretty. Test::Builder can’t really support these things well, but that doesn’t meant that people haven’t gotten those features by hook or by crook.
My Involvement in Test2
Most readers will be aware that I am a core developer of Mojolicious and author of many related CPAN modules. Test::Mojo::Role::Phantom (I’ll refer to it as ::Phantom for brevity) is a module which combines Test::Mojo and the headless Javascript interpreter Phantom.js to greatly simplify testing of a web application’s dynamic (content). Note that via Test::Mojo::Role::PSGI these other test modules can be used to test Catalyst, Dancer, or any other PSGI-compliant application.
When it starts, it spins up a Mojolicious service on a random local port (possibly shimmed to host a PSGI app). It then forks a phantom Javascript interpreter; the child’s STDOUT is a pipe pointed back at the parent. Via this pipe the javascript can send back a JSON structure representing command to be executed back in the parent. These commands are intended to run various test functions in the parent as they come in. You might say
perl('Test::More::is', some_selected_text, 'expected result', 'description')
Which back in the Perl process is executed as
Test::More::is(@remaining_arguments);
This mechanism works very well. While not a replacement for more dedicated front-end testing tools, helps authors who wouldn’t otherwise write any front-end tests to be able to start. But ::Phantom has some problems that I cannot directly solve.
The test script itself might only make one call to $t->phantom_ok($url, $js)
but the execution of the phantom interpreter might execute several tests.
In this scenario the call to phantom_ok
is logically equivalent to a subtest and so it wraps the fork and subsequent listen/interpret-cycle in a subtest.
However if the test fails, the failure is reported as emanating from inside phantom_ok
.
This problem alone wouldn’t be tremendously difficult to fix, you could simply tell Test::Builder to report test failures from a higher stack frame by doing the usual trick of
local $Test::Builder::Level = $Test::Builder::Level + 1;
For ::Phantom however this will not help. The reason is that since the parent still has to serve the page to the client and then the parent has to listen for calls from the child’s handle, it has to perform the test inside of an IO loop. Those of you familiar with IO loops can now see the problem. Now when the test is executed the original caller’s stack frame isn’t one level above the current one. In fact the original caller isn’t (really) in the call stack at all (and certainly isn’t a predictable number of frames above it.
This means that Test::Build’s interface for this task is too limiting. Now up until Chad Granum started working on Test2 the next step would be to hack Test::Builder in one of several nasty ways. In fact many many test modules on CPAN have already done this because they had no other choice. These hacks range from mild to horrendous and can even lead to test modules which conflict with each other in strange ways or break unexpectedly when Test::More is updated seemingly innocuously.
Actually this stretches the truth; Test::More really isn’t ever updated because in practice too much breaks when that is attempted. Test::More is actually more like completely stuck, frozen by the hacks that people have perpetuated around it.
I am lucky though because Chad has been working on a replacement, Test2. You see Chad (and many others before him) have noticed this problem of people hacking Test::More and wanted a solution. Test2 has been specially (and painstakingly) written to provide public api mechanisms to replace as many of the hacks that were seen on CPAN as possible. The only reason those hacks existed was because there was no approved way to do what test modules authors needed.
In my case Test2 provides a mechanism to capture a context object from which the test successes and failures are emitted. When you think about it this is the much more natural way to tell the testing singleton what is desired. Rather than
sub my_cool_test_ok {
... # arbitrary depth of calls ...
# count the number of intervening stack frames $n
local $Test::Builder::Level = $Test::Builder::Level + $n
ok(...);
...
}
You can simply do
use Test2::API 'context';
sub my_cool_test_ok {
my $ctx = context();
... # arbitrary depth of calls ...
$ctx->ok(...);
...
}
and the tests will be reported correctly.
Test2 to the Rescue
My situation isn’t uncommon. I want to use Test::Builder in a way that wasn’t originally envisioned.
Test2 contains many such fixes for many abuse patterns, both common and uncommon. In fact the reason it has taken so long is that over the YEARS that this has been in development Chad has continued to find modules that did stranger and stranger things to Test::Builder and then found ways to support them! The Test::Builder interface doesn’t change when Test2 replaces its guts. This includes things that were never intended to be public interface but have defacto become public api over the course of all of this hacking.
There were a few cases that modules on CPAN could not be supported but most of those have now been updated to use supportable apis in Test::Builder. This leaves only a very small handful of only the most unsupportable ones. Once Test::Builder-on-Test2 hits CPAN nearly everyone will upgrade transparently thanks to this herculean effort. Not all of that praise goes to Chad as many others have submitted reports and offered suggestions and yes even held him back at times, but a huge amount does. The final product is a stunning level of backwards support that is worthy of the Perl language’s commitment to this goal.
Test2 is More Than Just A Way Forward
Here you have seen my argument for Test2 as the successor to Test::Builder from the perspective of having a more useful public api. It is important to note that this wasn’t the original motivation.
Test2 brings one other huge benefit, an event system!
When you run ok
using Test2 you don’t just print ‘ok $n’ anymore, you emit a passing test event.
For most users this will simply be consumed by a TAP emitter and it will print ‘ok $n’, but this isn’t all you can do.
When testing Test:: modules written with Test2 no longer have to parse the TAP stream to see if they pass their own tests. Test2’s robust system of events can be watched and then compared as data structure streams. I can write tests for ::Phantom without ever having to regexp on TAP output!
We can finally test our testers!
Not only that but those event streams don’t need to emit TAP at all! Users desirous of xUnit, TeamCity or other formats no longer need to parse the output and re-emit what they need. Simply attach emitters for the preferred output and Test2 does what you need!
Why Not Opt-In?
The first response of anyone new to the situation is to ask “well can’t the test script author or at least the module author choose to opt-in rather than replace the guts transparently?”
Remember up at the top of the discussion that there is this central point of knowledge of the state of the test run? Any Test:: module can practically only target one authority, whether it be Test::Builder or Test2. Sure some module authors could provide one version of a module for each, but that places a burden on them and a cognitive load on the consumer to pick all their test modules from one pool or the other. In practice this will not occur, the inertia behind Test::Builder is too great. Test::Builder is the existing authority and it must remain as a least a view into the state of test progress for all existing tests.
This means that we are left with a question of can one authority delegate to the other? Test::Builder is too limited in its scope to continue to improve marginally, let alone be the substructure of its own replacement, it simply doesn’t have the capacity; Test2-on-Test::Builder isn’t possible.
No, from those axioms we can see that for Test2 to gain any adoption, it must replace Test::Builder in holding the authority and provide a new Test::Builder as the interface to older tests. Which leads us back to opt-in. If users have to opt-in, then tests written with and for Test2 cannot communicate with older tests, leading inevitably back to the first case of a forked ecosystem.
Now there is no guarantee of perfection, but if you had been following this process you wouldn’t doubt the lengths to which compatibility has motivated it. I know I would have quit long long ago had it been me! At this point, most of the detractors have been converted and the few holdouts seem to mostly have the fear of the unknown for effects on the DarkPAN. Fair enough; this represents a judgement call. Perfect backwards compatibility by atrophy versus progress with some small risk almost entirely mitigated at incredible effort.
I know where I stand on that call.
The Results of Discussions at QAH
The discussions around the Test::Builder-on-Test2 centered around end-user notification. It seems that most if not all of the technical discussion has wrapped up and the code has been well reviewed.
To help the users of the few affected modules, it was decided to list ALL of the known broken modules (even those that are now fixed by upgrading from CPAN) in a document. This document (part of the Test2 distribution) will tell those few affected users what they should do if they experience trouble. Further, when installing Test::Builder-on-Test2 the user will be notified if they have any existing modules which need upgrade or even replacement. Remember though, this list is very very small.
The benefits of Test2 are huge. Let’s see it make this final step and become the new backbone of Perl’s great testing infrastructure.
Thanks for the writeup!
I haven’t been following the progress or reasoning behind Test2 and your blog post clearly laid it all out for me.