DBIx::Class::Report - generate ad-hoc dbic classes from SQL

Object-Relational Mappers (ORMs) are awesome if you think about your database a collection of objects. They're awful if you think about your database as a database. My primary client right now is ZipRecruiter and my work involves rewriting one of their internal systems to be more flexible. However, it involves tons of reporting. For example, I have this SQL:

  SELECT var.name, ce.event_type, count(*) as total
    FROM tracking_conversion_event ce          JOIN tracking_visitor visitor      ON visitor.tracking_visitor_id      = ce.tracking_visitor_id
    JOIN tracking_version_variant curr ON curr.tracking_version_variant_id = visitor.tracking_version_variant_id
    JOIN tracking_version ver          ON ver.tracking_version_id          = curr.tracking_version_id
    JOIN tracking_variant var          ON var.tracking_variant_id          = curr.tracking_variant_id
   WHERE ver.tracking_id = ?
     AND ver.version     = ?
GROUP BY 1, 2

What you can't see in that SQL is that there are many subtleties in it:

  • That count(*) as total sometimes needs to be replaced by sum(ce.num_events) as total
  • Sometimes I need different sets of fields returned (and that impacts the GROUP BY, too)
  • One of those JOIN statements sometimes needs to be a LEFT JOIN.

As you can imagine, that's frustrating. These are the sorts of things that don't map too well to an ORM. As a result, I've hard-coded some SQL and used $dbh->selectall_arrayref, and repeatedly struggled with the fact that sometimes the columns can be in different positions of that array ref.

Or I could return hashrefs and wonder if $result->{total} is really 0, or did I write $result->{totla} again?

Or in another case I found, we had a resultset return many rows of data with about 20 columns per row, but we only needed 4 or 5 columns. Restricting our data to only what we need not only makes the queries faster, it can reduce I/O when you're fetching results from another server. Class::DBI tried to allow this, but that only works at a class level, not query-by-query.

In reality, everyone here is used to working with DBIC resultsets. Returning complex data structures and having people try to remember which index maps to which column, or if the hash keys have changed is frustrating. Wouldn't it be nice if we could just take a chunk of arbitrary SQL and get a resultset from it? Well, now you can.

The code is called DBIx::Class::Report and it's very much a proof of concept! It builds a DBIx::Class view on the fly. It works like this:

use DBIx::Class::Report;

my $report = DBIx::Class::Report->new(
    schema  => $dbic_schema_object,
    sql     => $complicated_sql,
    columns => \@accessor_names,
);
my $resultset = $report->fetch(@bind_params_for_complicated_sql);
while ( my $result = $resultset->next ) {
    # use like a normal dbic result, but it's read only
}

That's the shell of it, but let's use the query I showed above, just to be clear:

my $sql = <<'SQL';
  SELECT var.name, ce.event_type, count(*)
    FROM tracking_conversion_event ce
    JOIN tracking_visitor visitor      ON visitor.tracking_visitor_id      = ce.tracking_visitor_id
    JOIN tracking_version_variant curr ON curr.tracking_version_variant_id = visitor.tracking_version_variant_id
    JOIN tracking_version ver          ON ver.tracking_version_id          = curr.tracking_version_id
    JOIN tracking_variant var          ON var.tracking_variant_id          = curr.tracking_variant_id
   WHERE ver.tracking_id = ?
     AND ver.version     = ?
GROUP BY 1, 2
SQL

my $events_per_name = DBIx::Class::Report->new(
   schema  => $schema,
   sql     => $sql,
   columns => [qw/name event_type total/],
);

my $resultset = $events_per_name->fetch( $tracking_id, $version );
say $resultset->count; # yeah, it behaves just like a normal dbic resultset

while ( my $result = $resultset->next ) {
   say $result->name;
   say $result->event_type;
   say $result->total;
}

The core idea is simple enough that you should be able to throw just about anything at this and it should just work (but it's tricky with SQLite). This also helps when you don't want to litter your codebase with a bunch of one-off DBIC views that are only called in a single method. Also, note that the data you get back is read-only. You cannot update anything.

If I get it stable, I'll push it to the CPAN at some point.

Many thanks to ribasushi++ and castaway++ for helping me debug this.

7 Comments

This is awesome! I've wanted this so many times! Next time this comes up I'll give it a whirl and let you know the result.

This is a great idea! I think the worst thing about DBIC is its dependence on the horrendous SQL::Abstract for specifying queries, and I've wanted something like this for years.

Thanks a lot!

"These are the sorts of things that don't map too well to an ORM."

Well, it depends on which ORM you choose :-)

In DBIx::DataModel :

a) choosing joins

# joins as given by default in the schema declaration
my $src = $schema->join(qw/table1 other1 other2 ../) ;

# forcing inner (<=>) or left (=>) joins
$src = $schema->join(qw/table1 <=> other1 => other2/);

b) changing columns

my @grouped_cols = qw/col1 col2/;
my @other_cols   = qw/count(*)|total/; # '|' is aliasing
my $result = $src->select(
  -columns => [@grouped_cols, @other_cols],
  -where => ...,
  -group_by => \@grouped_cols,
);

@Mark : please elaborate, what is it that you find 'horrendous' in SQL::Abstract ? Do you think that SQL::Abstract::More does any improvement ?

As it happens, I had an occasion to use it this already this weekend! It worked like a charm. Ovid++

Is this pretty similar to DBIx::Raw? I am not complaining, just trying to know when to use which tool.

https://metacpan.org/pod/DBIx::Raw

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/