More stupid testing tricks

For the guy who wrote the test harness currently ships with Perl and has commit rights to an awful lot of the Perl testing toolchain, I sure do seem to do a lot of stupid things while testing. That being said, sometimes I need to do those stupid testing tricks. That's because there seem to be roughly two types of developers:

  • Those who work in a perfect world
  • Those who work in the real world

I say the latter with a bit of bitterness because invariably I keep hearing YOU MUST DO X AND NOTHING ELSE where "X" is a practice that I often agree with, but it's the "and nothing else" bit that really frosts my Pop Tart (tm).

I'm in the rather unfortunate position of having an NDA so I can't exactly explain what's driving a particular use case, but I have a fantastic job which nonetheless has some serious constraints which I'm not in a position to deviate from. So not only am I not in a position to follow best practices in what I'm about to describe, I'm not even in a position to tell you why. Suffice it to say that I have an enormous system which I'm faced with and many things which I would take for granted in other environments are not the case here, so I'm forced to improvise. (Note that I didn't say it's a bad system. It's a different system and there is at least one fundamental assumption about software development which doesn't apply here, but I can't say more)

So let's say that you have a rather large dataset you're testing and you have some contraints you must face:

  1. You have no control over the actual data
  2. You cannot mock up an interface to that data
  3. The data is volatile

How do you test that? Let's say a function returns a an array of array refs. At first, I tried writing something like the Levenshtein edit distance for data structures, but our data is so volatile that instead of having the tests fail the day after they're written (the data I test against is more-or-less stable for one day), I could have them last several days before failure hits.

Still, coming back a week later and still having the tests fail is not good. Further, by the time the data bubbles up to me, the criteria by which it's assembled and sorted is not present, so I have no way of duplicating that in my test (and it's complex enough that I wouldn't want to duplicate it).

Thus, I'm stuck with the awful problem of tests which are going to break quickly. I thought about the excellent Test::Deep, but that can let me validate the structure of the data, not the meaning. Test::AskAnExpert could let me know the meaning by punting to the human (me, in this case), but this doesn't do anything about the data being so volatile.

So I've written the abysmally stupid Test::SynchHaveWant. The idea is that the results you want are in the __DATA__ section of your .t file and if the test(s) fail, you can look at the failures and if they're not really failures, you can then "synch" your "wanted" results to the new results and watch them pass again. We do this by writing the synched results to the __DATA__ section.

For example: let's say that commit X on Feb 3rd is a known good commit, but your tests are now failing on Feb 27th. Roll your code back to X and rerun the tests. If they fail in the same way, you can assume that it's merely data changes. Simply "synch" your test data, rerun the test to verify, then checkout "head" again and make sure the tests pass.

This is an incredibly bad idea for several reasons:

  • Simply asserting that the results you want are the results you got is begging for laziness and false positives.
  • Rewriting your source code on disk is very stupid.
  • The data you want is now in the __DATA__ section, pulling it away from the code which should have it, masking the intent.
  • It's still a lot of manual work when there are failures.

All things considered, this is probably one of the dumbest testing ideas I've had, but it's working. I've a few more ideas to make it easier to use, but I'm still trying to figure out a cleaner way of making this work.

8 Comments

FYI, the accepted abbreviation of synchronize is sync, not synch.

Living as well in the real world, and even more, in the production world, I know the kind of problems you describe. "Upgrade! Upgrade!" isn't an answer when working on a non trivial production platform where 5.8 is the main version of Perl.

WRT to your tests, are you testing the code or the data? if the code, why testing against live data? why not keeping some well-known data to provide stable samples to work with?

I think i might be doing things similar to what you described here. I have to convert lots of data in such a way that actually writing the $want data isn't feasible, since it's so much AND changes all the time. My rescue was Test::Regression

It allows me to do things like this:

# on the shell:
set TEST_REGRESSION_GEN=1

# in the perl script:
use Test::Regression;
use Data::Dumper;
my $data = get_complex_stuff();
ok_regression( sub{ Dumper $data }, "t/data/complex.dump", 'complex data matches' );

The ENV assignment there tells Test::Regression to not actually do a comparison, but to just generate the data from the sub and dump it into the file name given. Then i look at git diff to see if my data changed and if the change is something i wanted.

Additionally, on a user system, TESTREGRESSIONGEN won't be set, so it does a proper comparison there.

You work at booking.com, right?

Because of the analogy. :-)

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.