Benchmarking DBIx::Class v.s. plain DBI on Hailo
This is another posting about using Hailo as a guinea pig to test some module. Previously I wrote up on how we tested MooseX::Method::Signatures (which we ended up not using) and Mouse (which we now use by default).
Hailo used to use DBIx::Perlish for its database interaction but this was removed in favor of plain DBI in version 0.02. Hailo now uses an ad-hoc DBI-based setup where each database query is kept in its own Data::Section where each section content is a Template::Toolkit template.
All our database queries are generated from these templates when the storage backend is initialized, they're then prepared and kept around in a hash. There's no runtime penalty that you wouldn't get by just using plain DBI.
This setup is fast and it works, but it's also pretty nasty. Nothing else in Hailo reinvents the wheel so it was always on the TODO to try some more mature DBIx::* module like DBIx::Class.
Take everything after this point with a grain (or a bag) of salt. This little project forms the extent of my working knowledge with DBIx::Class. I'm probably doing something wrong with it and part of the reason for this posting is to solicit input on what that may be.
With that out of the way here are the results of benchmarking Hailo with DBI and DBIx::Class. For reference here's a comparison of master...dbix-class and Hailo's DBD.pm under the master and dbix-class branches.
Here's the time it takes to run the test suite with and without DBIx::Class:
So our tests run take 6 times longer to run with the dbix-class branch. This is mainly reflects the time it takes to train with Hailo. We make a lot of database queries when training. Training from a 100 line file results in just under 4300 INSERT/UPDATE/SELECT queries.
Training with the 350 line megahal.trn
file takes just over a second on the master
branch but around 28 seconds on the dbix-class
branch.
Replying from an already constructed brain isn't that different under dbix-class
, the reason being that replies are cheap compared to training. Generating a random reply from a brain trained on 200k lines of input averages between 10 and 100 queries, there we're mainly IO-bound.
I profiled a small file in the test suite on both master and dbix-class. Here's the top 15 subroutine calls from each one:
master:
dbix-class:
The #1 hit being a call to Class::Accessor that generates an accessor method suggests to me that something is going wrong, but I can't see what that something is. I read the DBIx::Class FAQ and Cookbook but couldn't find any pertinent optimization advice, except replacing some of the sugar with DBIx::Class::Cursor. I didn't try that yet.
One thing I did was check out whether my database queries were being cached as prepared handles using this advice and it turned out that they were.
I was also very careful when writing the dbix-class
branch to check that DBIx::Class was generating the same SQL as Hailo, it does so in all cases but one:
When we insert a new primary key on the master
branch we use SELECT last_insert_rowid();
under SQLite or PostgreSQL's INSERT ... RETURNING
.
Under DBIx::Class I just call ->id
which seemingly does this by magic without an extra SELECT
query being printed under DBIC_TRACE=1
. I'd like to know how. It just uses DBI's lastinsertid.
Even if it turns out that I'm doing everything right and there's no way to make DBIx::Class faster than this for Hailo I'd still like to look into using it. By converting to it I got rid of a lot of manual DBI tedium required to support multiple backends. I'm not bootstrapping the schema itself with DBIx::Class yet but if I do that I'll have replaced all the SQL soup in our code. Update: the schema is now generated ->deploy
via introspection.
If I could use DBIx::Class to do all that and then get prepared DBI query handles I could execute manually I'd be happy, but I haven't found a way to do that reading the relevant documentation.
Aside from this speed issue my first impressions of DBIx::Class have been very positive. I'll probably use it for any future Perl code that accesses a database. Provided the application isn't an oddball like Hailo which isn't purely IO bound like most database-based programs.
Have you looked at Rose::DB? From the benchmarks I saw a while back it was considerably faster than DBIx::Class.
Michael Peters: No. Rose::DB wouldn't do what I want, it just manages DBI handles and doesn't touch the generation of SQL queries portably across databases or spawning a database given some simple abstract schema.
SQL::Abstract would be something closer to what I want but it doesn't look like it handles some of our more complex queries like figuring out that RANDOM() is what PostgreSQL uses but MySQL uses RAND() instead. It also can't generate a schema which would rid me of my hand-crafted CREATE TABLE SQL.
I'd like to use DBIx::Class, just not have it be as slow as it currently is.
One of the biggest performance hits you get with DBIC is object inflation of resultsets. That is when you create a resultset like "my $rs = $schema->resultset('User');" and then actually retrieve the SQL like "my @rows = $rs->all;", that "@rows" is an array of DBIC objects, and of course creating objects can be time consuming, particularly if the length of @rows is large.
If you need more speed you can skip this object inflation, returning instead an array of hashrefs. This is much faster. I'd love to see you previous benchmarking using this technique. All you do is set a resultset to the hash reinflator class: "my @rows = $rs->search({},{result_class => 'DBIx::Class::ResultClass::HashRefInflator'})->all;" I bet you'll find that is MUCH faster.
Thanks!
John Napiorkowski
Thanks John, #dbix-class pointed me in this direction as well. I hacked it up and converted anything that used
->search()
or->find
(that wasn't just called once during initialization) to use HashRefInflator.Benchmarking 730d54 and b6b79e v.s. doesn't reveal much of a difference, using b6b79e might actually be slightly slower but I don't have enough data (repeated tests take a long time) to make that claim.
The NYTProf output shows that the number of string evals slightly increased.
Anyway, it's obvious that the functions I'm calling are doing a lot of work behind the scenes that I don't really want to do. I'm just going to have to figure out how to get closer to the guts.
If speed is an issue, you may find that DBIx::DataModel gives you better results. Row objects in DBIx::DataModel are just hashrefs, where you access columns directly (no Class::Accessor cost); this is quite similar to the HashRefInflator of DBIC, except that the hashrefs are blessed objects, and all methods for navigating to related rows are still available.
Besides, there is an option for "fast statements" that retrieve each data row into the same memory location, using DBI's bind_columns() method. This avoids the cost of allocating/deallocating memory for each row ... but of course then the client-side must work in iterative mode, handling only one row at a time. See http://search.cpan.org/dist/DBIx-DataModel/lib/DBIx/DataModel/Doc/Design.pod
Thanks for the suggestion Laurent, DBIx::DataModel looks really interesting.
But as an update I went with just using writing some ad-hoc code to generate SQL and called it Hailo::Storage::Schema. The main reason was not wanting to spend more time checking out new DBIx::* stuff. Startup time was also reduced from 250ms to 150ms which makes a difference for interactive use.
It's clear that for programs like Hailo using DBI directly (or something that gives you raw DBI statement handles) is a big win. Any layer of indirection on top of that will slow you down.
If I do anything more in this area it'll probably involve rewriting parts of Hailo in C.
Hi,
You've mentioned you started to use DBIx::Perlish in the first versions of Hailo, but then dropped it.
As the author of the module, it would be very interesting to me to know why. Did it not do what you wanted? Was too slow? Buggy? Missing features?
Any comments on this would be much appreciated - maybe I can improve it to make it useful for you?
Cheers, \Anton.
Hinrik removed it so perhaps he can provide some more info. This is the commit that removed it so you can see how we were using it.
We found that using plain DBI queries was much faster, then we got an even bigger speedup later on by using prepared statements.
Yeah, we ran the program under Devel::NYTProf and found that most of the time was being spent in DBIx::Perlish::Parse if I recall. Performance improved drastically by switching to plain DBI.
Oh, and just to clarify, performance was really the only reason we dropped it. It's otherwise a pretty nice module. Writing Perlish code instead of SQL is pretty nice, and I didn't run into any bugs.