The ORM extension we need

Update: when I wrote on Twitter "Dear lazyweb: please write this code and drop me an email when you're done.", I was kidding. I'm not so silly as to think this is a trivial task (though I'd be delighted to be proven wrong).


I keep thinking about taking a stab at this, but I never quite seem to have the tuits. Essentially, I want a "reporting ORM", but then it wouldn't be an ORM, but it could leverage the metadata of a good ORM to pull off creation of an awesome reporting system.

So imagine your boss comes in an says "I want a report of total sales per salesperson in London". How would you write that? Think about an ORM solution versus SQL.

Assuming each salesperson has a city they're assigned to and each order would have a "city_id" in which its placed and each order has several items (and ignoring VAT or anything else), for DBIx::Class, you'd start like this:

my $rows = $schema->resultset('OrderItem')->search( 
  { 'cities.name' => 'London' },
  { 
    join => {
      orders => {
          cities => 'salespeople',
      },
    },
);

Or something like that (I always forget that syntax). The problem is that you have a bunch of 'OrderItem' objects in an resultset and you have to iterate over them, summing the total and calling their related objects to get the salespeople. That can result in many more database hits. In other words, the application of a procedural process to this could result in clumsy, poorly optimised code.

You could simply use SQL directly:

my $results = $dbh->selectall_arrayref(<<'END_SQL');
SELECT sum(oi.value) AS total, s.name AS name, 'London'
FROM   order_items oi, orders o, cities c, salespeople s
WHERE  o.item_id = oi.id
  AND  o.city_id = c.id
  AND  c.salesperson_id = s.id
GROUP BY name
END_SQL

for my $result (@$results) { ... }

And then you iterate over the results and shove them into the variables you want.

To me, this also feels a bit strange for several reasons. First, if your database schema changes, you have to alter all of the SQL which is affected. Often with a well-designed ORM, this can be abstracted away and you wouldn't notice it. However, the problem I really object to is the lack of integration here. You have a procedural wrapper around declarative code. Wouldn't it be nice if that wrapper went away?

Believe it or not, it can be done. One of my former employers had an interface which looked very much like the following:

# The actual API was different
my $dataset = dataset({
    select => [qw/sales city.name salesperson.name/],
    where  => { city => 'London' },
    group  => ['salesperson.name'],
});

Notice anything? There is no "from" clause and much of the "where" clause is missing. Because of the existence of FK constraints, the code would know which tables to select from and it could infer how to make the joins. In fact, if you pulled from a column whose name is unique, you wouldn't even have to qualify it with the table name. The "sales" would be a special case of extra code added to say "sales == sum(order_item.value)" and internally, the code just knew how to make that happen.

The dataset code was ungainly because we didn't actually have natural introspection and had to manually manage it, but this is doable and rather than spanning across multiple objects just to get the data for each row, you could write a query which more naturally represents the problem you're trying to solve. It would be fair chunk of work, but a reporting ORM tied into a regular ORM would be a huge boon for anyone needing heavy reporting capabilities.

8 Comments

You're forgetting that you can create new database objects and have your ORM use them. So for something like this, I would install a database view:

CREATE VIEW sales_per_salesperson
  SELECT sum(oi.value) AS total, s.name AS name, c.name AS city
  FROM   order_items oi, orders o, cities c, salespeople s
  WHERE  o.item_id = oi.id
    AND  o.city_id = c.id
    AND  c.salesperson_id = s.id
  GROUP BY name

Then I would create a new ORM class that uses this view as its source. Then you can query it with something like:

my $rows = $schema->resultset(
    'Report:SalesPerSalesPerson'
)->search({
    city => 'London',
});

So, create views for your reports, then create reporting ORM classes that use them.

—Theory

Have a look at DBIx::DataModel; your query would look more or less like this :

$schema->join(qw/Order_items|oi orders cities salespeople|s/)->select(
  -columns => [qw/sum(oi.value)|total s.name/],
  -where   => {city => 'London'},
  -groupBy => [qw/s.name/],
)

so you have about as much flexibility as SQL, but still work with fully-fledged objects, and the ORM takes care of the joins. Each row returned by that query is an instance of a dynamically created class, that multiply inherits from Order_items, Orders, Cities, etc.

If it’s ad-hoc queries you want, I know a great DSL for that. It’s called SQL.

—Theory

Reports tend to not be ad-hoc, since they are run more than once. If it isn't going to be run more than once, use SQL to create the data, otherwise use a View or Stored Procedure that creates the data.

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.