My database involved testing setup

By Davs on December 11, 2013 9:01 AM

Hi!

Today I'd like to show you my testing setup which involves database testing. Hopefully it can help someone out or maybe someone could suggest me better ways of doing things.

First of all, having all the tests in the root of the 't/' directory got too messy, so I'm using mostly the same directory structure in my 't/' directory, as I have in my 'lib/' directory for the ease of navigation. Let's say this is the 'lib/' tree:

- lib/
-     MyApp.pm
-     MyApp/
-         MyModule.pm

then my 't/' directory would have the following layout:
- t/
-     MyApp/
-         0001-MyApp_first_test.t
-         0002-MyApp_second_test.t
-         MyModule/
-             0001-MyModule_first_test.t
-             0002-MyModule_second_test.t

Because of the nested structure it would be messy to add the 'use lib' statement into the testfiles themselves to use my 'lib/' directory, so I give it as a parameter to prove. I run all my tests from the 't/' directory, so for the ease of use I created a 't/prove'

 #!/bin/bash

export MYAPP_TEST="prove"

time prove -r -Ilib -I../lib -j9 $@

./perl delete_test_db.pl

As I mentioned, every test I run from the 't/' directory with the './prove' command. In my 't/prove' script I prepend the actual 'prove' command with 'time', so after running the tests, I know approximately how long it takes to run them. (When they are taking too long, I try to find ways to speed the up somehow). The '-r' parameter tells 'prove' to search the 't/' directory recursively (which is needed for my directory structure). Then I tell prove to include the 't/lib' and 'lib/'. Also I turn on parallel execution of tests with the -j option. Of cource I faced problems, when I first tryed running database-involved tests on the same database in parallel, so I had to come up with a solution.

Getting the DBIx::Class::Schema object in the test files is delegated to a module, which is located at 't/lib/MyApp/Test/GetSchema.pm'. The way I solved parallel test execution is pretty simple. Every time GetSchema is called, it created a completely new database (if it does not exists) based on the PID. Then the Schema is deployed (which creates the tables, relations etc). This way the tests do not interfere with each other. The last line of the 't/prove' file is responsible for running a script, which then deletes these test databases. The name of these databases are in this form: myapp_testXXXX, where XXXX is the PID number. The delete_test_db.pl script looks for databases which name is of this pattern and simply drops them.

For running only one test file, I also have a 't/perl' script, which is similar to 't/prove', except it does not remove the database, so after running the test I can examine the state of the database manually. For this I use a database named myapp_test (without any number at the end), which is never deleted by the 'delete_test_db.pl script'. Here is 't/perl':

 #!/bin/bash

export MYAPP_TEST="perl"

perl -Ilib -I../lib $@

The MYAPP_TEST environmental variable is used in GetSchema to deremine wheter to use the 'myapp_test' database, or to create a new one including the PID in it's name.

Before I finally show you the GetSchema.pm module, I'd like to share my results. First of all, every test of mine start with a fresh database, so this way they are isolated from each other. Running MySQL from HDD was a bit slow, so I used a few dirty hacks for getting the databases into the RAM (I have no important or big database on my laptop). The speedup was quite significant, but for that I don't have any measurements (I can make them if someone will want it). At that time, running the tests sequentially with MySQL being in the ramdisk took around ~50sec. Making the tests run in parallel I could get this number down to ~16sec. The difference if quite huge, mainly if you run your tests often (which I do). Now the rest of the scripts:

t/delete_test_db.pl



use DBI;

#There is no DB name specified in the connection details

my $dbh = DBI->connect("dbi:mysql:","root","root");

my $sql = qq[ SHOW DATABASES ];

my $rows = $dbh->selectall_arrayref($sql);

#get the database names into an array

my @dbs = map { $_->[0] } @$rows;

for my $db (@dbs) {

    #delete DB, which starts with myapp_test followed by some number

    if($db =~ /^myapp_test[\d]+/) {

        $dbh->do("drop database `$db`");

    }

}

$dbh->disconnect;

t/lib/MyApp/Test/GetSchema.pm

 package MyApp::Test::GetSchema;

use v5.16;

use strict;

use warnings;

use utf8;

use DBI;

use MyApp::DB;

use Import::Into;

use DBIx::Class::Sims;

use Test::DBIx::Class ();  #important to use this way for Import::Into

use MyApp::Test::SimTypes; #this is where I create and register my

                           #own sim types

#DBIx::Class::Sims is used to generate random, but meaningful

#data for tests. It is turned on only in the tests 

BEGIN {

    package MyApp::DB;

    __PACKAGE__->load_components('Sims');

}

sub import {

    my $caller = caller;

    my $dbname = getdbname();

    createdb($dbname);

    #I use Test::DBIx::Class too 

    #I use Import::Into to import the schema etc into my tests

    #notice the $dbname in connect_info

    Test::DBIx::Class->import::into($caller, {

        schema_class => 'MyApp::DB',

        connect_info => ["dbi:mysql:$dbname",'root','root'],

        deploy_opts  => {

            add_drop_table    => 1,

            quote_identifiers => 1,

        },

        connect_opts => {

            on_connect_do     => "set names utf8;",

            quote_char        => '`',

            name_sep          => '.',

            mysql_enable_utf8 => 1,

            RaiseError        => 1,

        },

        deploy_db => 1, #explicitly tell Test::DBIx::Class to deploy the database (on by default)

    });

    {

        #I don't like the fact, that Test::DBIx::Class uses

        #capitalized names, so I create a lowercase alias for them

        no strict 'refs';

        *{$caller."::schema"}    = *{$caller."::Schema"};

        *{$caller."::resultset"} = *{$caller."::ResultSet"};

        *{$caller."::rset"}      = *{$caller."::ResultSet"};

    }

}

#get the database name based on the MYAPP_TEST environmental variable

#which is set by 't/prove' and 't/perl'

sub getdbname {

    my $dbname;

    if( $ENV{MYAPP_TEST} eq 'prove' ) {

        $dbname = "myapp_test".$$;

    } else {

        $dbname = "myapp_test";

    }

    return $dbname;

}

sub createdb {

    my ($dbname) = @_;

    my $conn = DBI->connect("dbi:mysql:","root","root");

    $conn->do("create database IF NOT EXISTS `$dbname` CHARACTER SET utf8 COLLATE utf8_general_ci");

    $conn->disconnect;

}

1;

In my test files I just simply use MyApp::Test::GetSchema, after which I have the schema, resultset and rset keywords available for use.

6 comments

Tagged as:

database, dbix, dbix::class::sims, parallel, prove, test, test::dbix::class

6 Comments

domm | December 11, 2013 1:47 PM | Reply

You can use the -l flag which is basically the same as -I lib

And if you're using a proper DB supporting nested transactions (i.e. postgres) you could start a transaction at the beginning of each test file and roll it back at the end, which I think is faster than setting up a new, fresh DB for each test file.

But in general this seems like a very sane test setup!

craig.treptow | December 11, 2013 1:49 PM | Reply

Thanks for this.

There is a '--timer' option to prove if you want:

L:\src\moose-class\exercises>prove --timer
[07:46:55] t\00-prereq.t ............... ok      869 ms

Davs | December 11, 2013 4:16 PM | Reply

@craig.treptow thanks for that, it is very useful

@domm well, for this project I am kinda stuck with mysql. With that transaction you may be right, I'll have to test it. How would it go basically? I can not think of anything else than putting my entire test into a transaction including the setup and test too (so it would be able to run in parallel)

Also, I've made some measurements. I've ran the same test, once the MySQL DB was on the HDD, once on the SSD and finally in RAM. I've run the test both sequentially and in parallel too. Here are the results:
HDD sequential 4min4sec
HDD parallel 2min29sec

SSD sequential 1min7sec
SSD parallel 27sec

RAM sequential 56sec
RAM parallel 23sec

There is not that big of a difference between the SSD and the RAM, but I would rather not run these stuff from the SSD, because they seem like a heavy write-load (especially if I have the free ram..).

Davs | December 11, 2013 4:50 PM | Reply

Actually, I've tried transactions, after getting the schema, I called schema->txn_begin and before done_testing() I put schema->txn_rollback. When running test sequentually, i took about 33+ sec, which is quite nice. But I could not get it to work when running parallel, because deadlock occured. Maybe I did something wrong, maybe MySQL is not suited for this. Anyway, for now I'm gonna go with the separate databases and parallel test execution.

Davs | December 11, 2013 5:06 PM | Reply

I've been experimenting with transactions further. It IS fast (around 14sec) but I keep getting mostly 1 error, but sometimes more, which is I think due to race conditions or something (long story short it does not like the fact, that multiple processes are running on/altering the same DB). If I manage to get it working properly, Ill make a blog post about it :)

Davs | December 11, 2013 5:21 PM | Reply

Wow I think I managed to fix it and cut down test execution time to 15sec. The problem was in test, where I would do some DB operations in a separate scope. Making everything to happen in the same scope as txn_begin and txn_rollback seems to fix the problem

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Davs

About my journey with Perl, PerlDancer2, DBIx::Class and testing.

More info »

Davs