Benchmarking DBIx::Class v.s. plain DBI on Hailo

This is another posting about using Hailo as a guinea pig to test some module. Previously I wrote up on how we tested MooseX::Method::Signatures (which we ended up not using) and Mouse (which we now use by default).

Hailo used to use DBIx::Perlish for its database interaction but this was removed in favor of plain DBI in version 0.02. Hailo now uses an ad-hoc DBI-based setup where each database query is kept in its own Data::Section where each section content is a Template::Toolkit template.

All our database queries are generated from these templates when the storage backend is initialized, they're then prepared and kept around in a hash. There's no runtime penalty that you wouldn't get by just using plain DBI.

This setup is fast and it works, but it's also pretty nasty. Nothing else in Hailo reinvents the wheel so it was always on the TODO to try some more mature DBIx::* module like DBIx::Class.

Take everything after this point with a grain (or a bag) of salt. This little project forms the extent of my working knowledge with DBIx::Class. I'm probably doing something wrong with it and part of the reason for this posting is to solicit input on what that may be.

With that out of the way here are the results of benchmarking Hailo with DBI and DBIx::Class. For reference here's a comparison of master...dbix-class and Hailo's DBD.pm under the master and dbix-class branches.

Here's the time it takes to run the test suite with and without DBIx::Class:

So our tests run take 6 times longer to run with the dbix-class branch. This is mainly reflects the time it takes to train with Hailo. We make a lot of database queries when training. Training from a 100 line file results in just under 4300 INSERT/UPDATE/SELECT queries.

Training with the 350 line megahal.trn file takes just over a second on the master branch but around 28 seconds on the dbix-class branch.

Replying from an already constructed brain isn't that different under dbix-class, the reason being that replies are cheap compared to training. Generating a random reply from a brain trained on 200k lines of input averages between 10 and 100 queries, there we're mainly IO-bound.

I profiled a small file in the test suite on both master and dbix-class. Here's the top 15 subroutine calls from each one:

master:

dbix-class:

The #1 hit being a call to Class::Accessor that generates an accessor method suggests to me that something is going wrong, but I can't see what that something is. I read the DBIx::Class FAQ and Cookbook but couldn't find any pertinent optimization advice, except replacing some of the sugar with DBIx::Class::Cursor. I didn't try that yet.

One thing I did was check out whether my database queries were being cached as prepared handles using this advice and it turned out that they were.

I was also very careful when writing the dbix-class branch to check that DBIx::Class was generating the same SQL as Hailo, it does so in all cases but one:

When we insert a new primary key on the master branch we use SELECT last_insert_rowid(); under SQLite or PostgreSQL's INSERT ... RETURNING.

Under DBIx::Class I just call ->id which seemingly does this by magic without an extra SELECT query being printed under DBIC_TRACE=1. I'd like to know how. It just uses DBI's lastinsertid.

Even if it turns out that I'm doing everything right and there's no way to make DBIx::Class faster than this for Hailo I'd still like to look into using it. By converting to it I got rid of a lot of manual DBI tedium required to support multiple backends. I'm not bootstrapping the schema itself with DBIx::Class yet but if I do that I'll have replaced all the SQL soup in our code. Update: the schema is now generated ->deploy via introspection.

If I could use DBIx::Class to do all that and then get prepared DBI query handles I could execute manually I'd be happy, but I haven't found a way to do that reading the relevant documentation.

Aside from this speed issue my first impressions of DBIx::Class have been very positive. I'll probably use it for any future Perl code that accesses a database. Provided the application isn't an oddball like Hailo which isn't purely IO bound like most database-based programs.

10 Comments

Have you looked at Rose::DB? From the benchmarks I saw a while back it was considerably faster than DBIx::Class.

One of the biggest performance hits you get with DBIC is object inflation of resultsets. That is when you create a resultset like "my $rs = $schema->resultset('User');" and then actually retrieve the SQL like "my @rows = $rs->all;", that "@rows" is an array of DBIC objects, and of course creating objects can be time consuming, particularly if the length of @rows is large.

If you need more speed you can skip this object inflation, returning instead an array of hashrefs. This is much faster. I'd love to see you previous benchmarking using this technique. All you do is set a resultset to the hash reinflator class: "my @rows = $rs->search({},{result_class => 'DBIx::Class::ResultClass::HashRefInflator'})->all;" I bet you'll find that is MUCH faster.

Thanks!

John Napiorkowski

If speed is an issue, you may find that DBIx::DataModel gives you better results. Row objects in DBIx::DataModel are just hashrefs, where you access columns directly (no Class::Accessor cost); this is quite similar to the HashRefInflator of DBIC, except that the hashrefs are blessed objects, and all methods for navigating to related rows are still available.

Besides, there is an option for "fast statements" that retrieve each data row into the same memory location, using DBI's bind_columns() method. This avoids the cost of allocating/deallocating memory for each row ... but of course then the client-side must work in iterative mode, handling only one row at a time. See http://search.cpan.org/dist/DBIx-DataModel/lib/DBIx/DataModel/Doc/Design.pod

Hi,

You've mentioned you started to use DBIx::Perlish in the first versions of Hailo, but then dropped it.

As the author of the module, it would be very interesting to me to know why. Did it not do what you wanted? Was too slow? Buggy? Missing features?

Any comments on this would be much appreciated - maybe I can improve it to make it useful for you?

Cheers, \Anton.

Yeah, we ran the program under Devel::NYTProf and found that most of the time was being spent in DBIx::Perlish::Parse if I recall. Performance improved drastically by switching to plain DBI.

Oh, and just to clarify, performance was really the only reason we dropped it. It's otherwise a pretty nice module. Writing Perlish code instead of SQL is pretty nice, and I didn't run into any bugs.

Leave a comment

About Ævar Arnfjörð Bjarmason

user-pic Blogging about anything Perl-related I get up to.