March 2010 Archives

What happens when you assign to $0?

According to git log I've submitted 33 patches to perl core, I just submitted another one which on Linux changes this:

Into this:

If this patch gets applied doing killall onion on 5.14 will get you a dead process instead of onion: no process found, the same goes for ps(1) and top(1) which read the legacy process name by default. Before this patch the behavior of $0 assignment on Linux hadn't changed since perl 4.000 came out.

Removing Moose and the case for Mouse

I thought I'd write my own practical version of Moose::Manual::Unsweetened by converting Hailo away from Moose. The resulting hack passes all of Hailo's tests but not surprisingly it wasn't worth it.

I wanted to see if I could get the startup time of Hailo down since Moose doesn't incur a runtime penalty once all your classes have been constructed. Here's how much time it takes the three version of Hailo to start up and reply to input, and how much (RSS) memory they use:

  • Hacky Perl OO: 100ms / 5.6MB
  • Mouse: 150ms / 7.4MB
  • Moose: 350ms / 12MB

Mouse seems to get a lot of flak within Moose circles for not being a 100% complete Moose implementation. I previously wrote about how you can maintain dual support for Moose and Mouse in a previous posting.

In future programs I'll be using Mouse as the default with a fallback to Moose. It does everything I need from Moose and doesn't suffer from the high startup time / memory use that's frequently cited as an objection to Moose-based applications.

If that doesn't convince you, here's a mouse riding a frog:

Ribbit! Ribbit!

Benchmarking DBIx::Class v.s. plain DBI on Hailo

This is another posting about using Hailo as a guinea pig to test some module. Previously I wrote up on how we tested MooseX::Method::Signatures (which we ended up not using) and Mouse (which we now use by default).

Hailo used to use DBIx::Perlish for its database interaction but this was removed in favor of plain DBI in version 0.02. Hailo now uses an ad-hoc DBI-based setup where each database query is kept in its own Data::Section where each section content is a Template::Toolkit template.

All our database queries are generated from these templates when the storage backend is initialized, they're then prepared and kept around in a hash. There's no runtime penalty that you wouldn't get by just using plain DBI.

This setup is fast and it works, but it's also pretty nasty. Nothing else in Hailo reinvents the wheel so it was always on the TODO to try some more mature DBIx::* module like DBIx::Class.

Take everything after this point with a grain (or a bag) of salt. This little project forms the extent of my working knowledge with DBIx::Class. I'm probably doing something wrong with it and part of the reason for this posting is to solicit input on what that may be.

With that out of the way here are the results of benchmarking Hailo with DBI and DBIx::Class. For reference here's a comparison of master...dbix-class and Hailo's DBD.pm under the master and dbix-class branches.

Here's the time it takes to run the test suite with and without DBIx::Class:

So our tests run take 6 times longer to run with the dbix-class branch. This is mainly reflects the time it takes to train with Hailo. We make a lot of database queries when training. Training from a 100 line file results in just under 4300 INSERT/UPDATE/SELECT queries.

Training with the 350 line megahal.trn file takes just over a second on the master branch but around 28 seconds on the dbix-class branch.

Replying from an already constructed brain isn't that different under dbix-class, the reason being that replies are cheap compared to training. Generating a random reply from a brain trained on 200k lines of input averages between 10 and 100 queries, there we're mainly IO-bound.

I profiled a small file in the test suite on both master and dbix-class. Here's the top 15 subroutine calls from each one:

master:

dbix-class:

The #1 hit being a call to Class::Accessor that generates an accessor method suggests to me that something is going wrong, but I can't see what that something is. I read the DBIx::Class FAQ and Cookbook but couldn't find any pertinent optimization advice, except replacing some of the sugar with DBIx::Class::Cursor. I didn't try that yet.

One thing I did was check out whether my database queries were being cached as prepared handles using this advice and it turned out that they were.

I was also very careful when writing the dbix-class branch to check that DBIx::Class was generating the same SQL as Hailo, it does so in all cases but one:

When we insert a new primary key on the master branch we use SELECT last_insert_rowid(); under SQLite or PostgreSQL's INSERT ... RETURNING.

Under DBIx::Class I just call ->id which seemingly does this by magic without an extra SELECT query being printed under DBIC_TRACE=1. I'd like to know how. It just uses DBI's lastinsertid.

Even if it turns out that I'm doing everything right and there's no way to make DBIx::Class faster than this for Hailo I'd still like to look into using it. By converting to it I got rid of a lot of manual DBI tedium required to support multiple backends. I'm not bootstrapping the schema itself with DBIx::Class yet but if I do that I'll have replaced all the SQL soup in our code. Update: the schema is now generated ->deploy via introspection.

If I could use DBIx::Class to do all that and then get prepared DBI query handles I could execute manually I'd be happy, but I haven't found a way to do that reading the relevant documentation.

Aside from this speed issue my first impressions of DBIx::Class have been very positive. I'll probably use it for any future Perl code that accesses a database. Provided the application isn't an oddball like Hailo which isn't purely IO bound like most database-based programs.

Odd issue with in-memory filehandles being cut off with Pod::Simple

Fayland Lam apparently got tired of me sending him patches so I released his Dist::Zilla::Plugin::ReadmeFromPod on CPAN today with some fixes, yay for GitHub powered collaboration.

Fayland previously released version 0.05 with a patch of mine which replaced IO::Scalar with an in-memory open(). This fixed an issue with PODs with non-ASCII E<> characters but introduced another one.

For some reason using in-memory open() like this yields a POD that's cut off, here's a test for it via Gist. I couldn't find out why in-memory open() behaved like this so I just used temporary files instead and released that as 0.06.

Theoretically I could have avoided all this by using Pod::Simple's output_string method to save the processed POD to a string instead of a filehandle, only that brought me right back to square one since that too has encoding issues.

Every once in a while I run into some odd encoding issues which I just have to give up and work around. In this case Pod::Simple is actually reading a pure ASCII file with =encoding utf8 but for some reason it's deciding against generating UTF-8 output unless PerlIO apparently does it for it.

Encoding issues are hard, especially when they're happening in some huge & opaque module that literally includes a file called Blackbox.pm. I think I understand perl's encoding model pretty well though, but "if you think you understand [perl's encoding handling], then you don't understand [perl's encoding handling]".

About Ævar Arnfjörð Bjarmason

user-pic Blogging about anything Perl-related I get up to.