Aggressive Database Optimisation

While my Enterprise Perl cartoon may seem like a joke, it's not. It's a sad fact that for larger codebases, tests can take a long, long time to run. The one I used on the BBC PIPs project took an hour and twenty minutes to run when I left that team. The one I use on the BBC Dynamite project takes just over an hour to run. Adam Kennedy, on the Enterprise Perl post, reported his tests can take a couple of hours to run.

On my Veure project, I've been very, very aggressive about tests. I just finished another optimization and the tests now take just over 30 seconds to run. Out of curiosity, I disabled most optimizations and the tests take almost 6 minutes. In other words, my tests are now over an order of magnitude faster than if I had written them like most developers write code for modules. Modules really do foster a different testing strategy than applications.

I primarily manage to gain my optimisations by:

  • Rebuilding changed tables rather than rebuilding the entire database
  • Using a small but valid subset of data (rebuilding tables with thousands of rows is slow)
  • Running almost all tests in a single process

The danger is that a valid subset of data is tough to achieve. What's valid? Part of it is simply knowing your system very well, but it could mean that there are use cases you'll miss. It's a tradeoff, but given that we already know that testing can't find all bugs, it's OK to say "I know I'll miss some things with a subset of data". To handle this, when you find something you miss, simply add it back to your subset or, alternatively, create a "fixture" that specific tests can load in the Test::Class "setup" phase (in other words, load before every test method for a subset of test classes) or just at the start of the test methods which need it.

I would love to see "tags" available on Test::Class tests so you could do this:

sub setup : Tests(setup) {
    my $test = shift;
    $test->next::method;   # call parent setup

    # I'd actually automate this in my base class
    $test->load_fixture('customer') if $test->current_method->has_tag('customer');
}

sub customers : Tests(7) : Tags("customer") {
    my $test = shift;
    # at this point, we're guaranteed to have the customer fixture loaded
}

sub some_other_test : Tests(17) {
    my $test = shift;
    # at this point, we're guaranteed to NOT have the customer fixture loaded
}

Adrian Howard has long wanted to add tags or something similar to Test::Class. Perhaps I can use this use case to bug him :)

If you're lucky enough to be starting a new, large-scale project for work, pay attention to optimising your test suite up front. At my current rate, my tests might take 15 minutes to run if they start approaching the size of the PIPs test suite. What that means, in a nutshell, is that I save an hour per test suite run. If a team of six developers each runs the full test suite once a day, that can be an extra developer's worth of time per day. It's probably worth more. That's because while it's six hours of savings directly, indirectly it's much more as the developers are forced to multi-task and possible bounce back and forth between different tasks they're not familiar with.

Now let's take it one step further: you switch to using Johan Lindström's Devel::CoverX::Covered. Make a change to some code? Rather than running the full test suite, you could use that module to only run tests which exercise the code you've just changed. It's a bit more work to set up, but once you've managed proper tool support, you can run a useful subset of tests, have them run quicker, feel more confident that they actually exercise the code in question, and only run the full test suite before merging (or going on lunch, going home, etc.).

This is important because you want your developers writing code, not fencing on the office chairs, waiting for the test suite to finish. And let's face it, many developers fight like mad to avoid any non-development tasks, so while those tests are running, you're paying them to do nothing. Not good for you. Not good for their morale.

So get back to work and fix your darned test suite.

6 Comments

Is there a perl equivalent to Ruby's autotest? I.E., you start it up, it watches your files (code and test files), and when one changes it runs only the test file that changed or which corresponds to the source file; if that test passes, then it runs all tests.

Hmm. Combine that with 'prove --state=failed,save' and 'prove --state=hot,all,save' and I could have something. Use -j to run tests in parallel could be good as well.

@Shalon: Gugod's Test::Continuous does exactly that.

Hmm... You mention that your test suite (the current one and past one) run under X minutes/seconds. Could you post the number of tests each of those suites have?

Leave a comment

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/