Benchmarking DBIx::Class v.s. plain DBI on Hailo
This is another posting about using Hailo as a guinea pig to test some module. Previously I wrote up on how we tested MooseX::Method::Signatures (which we ended up not using) and Mouse (which we now use by default).
Hailo used to use DBIx::Perlish for its database interaction but this was removed in favor of plain DBI in version 0.02. Hailo now uses an ad-hoc DBI-based setup where each database query is kept in its own Data::Section where each section content is a Template::Toolkit template.
All our database queries are generated from these templates when the storage backend is initialized, they're then prepared and kept around in a hash. There's no runtime penalty that you wouldn't get by just using plain DBI.
This setup is fast and it works, but it's also pretty nasty. Nothing else in Hailo reinvents the wheel so it was always on the TODO to try some more mature DBIx::* module like DBIx::Class.
Take everything after this point with a grain (or a bag) of salt. This little project forms the extent of my working knowledge with DBIx::Class. I'm probably doing something wrong with it and part of the reason for this posting is to solicit input on what that may be.
With that out of the way here are the results of benchmarking Hailo with DBI and DBIx::Class. For reference here's a comparison of master...dbix-class and Hailo's DBD.pm under the master and dbix-class branches.
Here's the time it takes to run the test suite with and without DBIx::Class:
So our tests run take 6 times longer to run with the dbix-class branch. This is mainly reflects the time it takes to train with Hailo. We make a lot of database queries when training. Training from a 100 line file results in just under 4300 INSERT/UPDATE/SELECT queries.
Training with the 350 line
megahal.trn file takes just over a second on the
master branch but around 28 seconds on the
Replying from an already constructed brain isn't that different under
dbix-class, the reason being that replies are cheap compared to training. Generating a random reply from a brain trained on 200k lines of input averages between 10 and 100 queries, there we're mainly IO-bound.
The #1 hit being a call to Class::Accessor that generates an accessor method suggests to me that something is going wrong, but I can't see what that something is. I read the DBIx::Class FAQ and Cookbook but couldn't find any pertinent optimization advice, except replacing some of the sugar with DBIx::Class::Cursor. I didn't try that yet.
One thing I did was check out whether my database queries were being cached as prepared handles using this advice and it turned out that they were.
I was also very careful when writing the
dbix-class branch to check that DBIx::Class was generating the same SQL as Hailo, it does so in all cases but one:
When we insert a new primary key on the
master branch we use
SELECT last_insert_rowid(); under SQLite or PostgreSQL's
INSERT ... RETURNING.
Under DBIx::Class I just call
->id which seemingly does this by magic without an extra
SELECT query being printed under
I'd like to know how. It just uses DBI's lastinsertid.
Even if it turns out that I'm doing everything right and there's no way to make DBIx::Class faster than this for Hailo I'd still like to look into using it. By converting to it I got rid of a lot of manual DBI tedium required to support multiple backends.
I'm not bootstrapping the schema itself with DBIx::Class yet but if I do that I'll have replaced all the SQL soup in our code. Update: the schema is now generated
->deploy via introspection.
If I could use DBIx::Class to do all that and then get prepared DBI query handles I could execute manually I'd be happy, but I haven't found a way to do that reading the relevant documentation.
Aside from this speed issue my first impressions of DBIx::Class have been very positive. I'll probably use it for any future Perl code that accesses a database. Provided the application isn't an oddball like Hailo which isn't purely IO bound like most database-based programs.