The ORM extension we need
Update: when I wrote on Twitter "Dear lazyweb: please write this code and drop me an email when you're done.", I was kidding. I'm not so silly as to think this is a trivial task (though I'd be delighted to be proven wrong).
I keep thinking about taking a stab at this, but I never quite seem to have the tuits. Essentially, I want a "reporting ORM", but then it wouldn't be an ORM, but it could leverage the metadata of a good ORM to pull off creation of an awesome reporting system.
So imagine your boss comes in an says "I want a report of total sales per salesperson in London". How would you write that? Think about an ORM solution versus SQL.
Assuming each salesperson has a city they're assigned to and each order would have a "city_id" in which its placed and each order has several items (and ignoring VAT or anything else), for DBIx::Class, you'd start like this:
my $rows = $schema->resultset('OrderItem')->search(
{ 'cities.name' => 'London' },
{
join => {
orders => {
cities => 'salespeople',
},
},
);
Or something like that (I always forget that syntax). The problem is that you have a bunch of 'OrderItem' objects in an resultset and you have to iterate over them, summing the total and calling their related objects to get the salespeople. That can result in many more database hits. In other words, the application of a procedural process to this could result in clumsy, poorly optimised code.
You could simply use SQL directly:
my $results = $dbh->selectall_arrayref(<<'END_SQL');
SELECT sum(oi.value) AS total, s.name AS name, 'London'
FROM order_items oi, orders o, cities c, salespeople s
WHERE o.item_id = oi.id
AND o.city_id = c.id
AND c.salesperson_id = s.id
GROUP BY name
END_SQL
for my $result (@$results) { ... }
And then you iterate over the results and shove them into the variables you want.
To me, this also feels a bit strange for several reasons. First, if your database schema changes, you have to alter all of the SQL which is affected. Often with a well-designed ORM, this can be abstracted away and you wouldn't notice it. However, the problem I really object to is the lack of integration here. You have a procedural wrapper around declarative code. Wouldn't it be nice if that wrapper went away?
Believe it or not, it can be done. One of my former employers had an interface which looked very much like the following:
# The actual API was different
my $dataset = dataset({
select => [qw/sales city.name salesperson.name/],
where => { city => 'London' },
group => ['salesperson.name'],
});
Notice anything? There is no "from" clause and much of the "where" clause is missing. Because of the existence of FK constraints, the code would know which tables to select from and it could infer how to make the joins. In fact, if you pulled from a column whose name is unique, you wouldn't even have to qualify it with the table name. The "sales" would be a special case of extra code added to say "sales == sum(order_item.value)" and internally, the code just knew how to make that happen.
The dataset code was ungainly because we didn't actually have natural introspection and had to manually manage it, but this is doable and rather than spanning across multiple objects just to get the data for each row, you could write a query which more naturally represents the problem you're trying to solve. It would be fair chunk of work, but a reporting ORM tied into a regular ORM would be a huge boon for anyone needing heavy reporting capabilities.
You're forgetting that you can create new database objects and have your ORM use them. So for something like this, I would install a database view:
Then I would create a new ORM class that uses this view as its source. Then you can query it with something like:
So, create views for your reports, then create reporting ORM classes that use them.
—Theory
Have a look at DBIx::DataModel; your query would look more or less like this :
so you have about as much flexibility as SQL, but still work with fully-fledged objects, and the ORM takes care of the joins. Each row returned by that query is an instance of a dynamically created class, that multiply inherits from Order_items, Orders, Cities, etc.
David: that doesn't get you the flexibility of ad-hoc queries from the code, something we had in our datasets.
dami: thanks for that. I'll take a look.
If it’s ad-hoc queries you want, I know a great DSL for that. It’s called SQL.
—Theory
Reports tend to not be ad-hoc, since they are run more than once. If it isn't going to be run more than once, use SQL to create the data, otherwise use a View or Stored Procedure that creates the data.
@David: thanks, but that misses the point. I'm talking about the difference between "structural" and "functional" code (from my blog post about synthetic classes).
Consider going to your new accountant and asking for a report of total sales per customer. They know the books, somewhat, but they still ask how you calculate total sales so you patiently explain that it's the total value of items per order per customer (ignoring tax and other niceties).
Later, you ask them for total sales per store and they ask you how to calculate this. Later you ask them for total sales per salesperson for the month of May and they ask you how to calculate this.
You're going to fire that damned accountant. If they're too stupid to understand the connections between different types of data and you have to explain it every time, they're pretty damned useless, but that's exactly what SQL is. Prolog, as a counter-example, does a good job understanding those relationships though you sometimes have to give it hints.
What I'm saying is that I want to write queries which allowed me to ask for the data I want, not painfully explain every step of how to acquire that data. I've worked with a similar Perl system before and it was great (though hard to maintain for due to lack of introspection) and as mentioned, you can do it in Prolog. The DBIx::DataModel looks like a start, but I'd want those joins inferred, not explicit.
James, no offence, but I'm not sure where you're coming from. Sure, some common reports are not ad-hoc, but many reports are ad-hoc and the reason that there's not more ad-hoc reporting is because the people needing the reports really don't want to go to IT and again beg for them to produce a report which satisfies their needs, nor do those people needing the reports want to learn SQL. They just want their damned data.
In my experience, as organizations get larger the people who generate the data and the people who consume data tend to get further apart. Thus, the consumers have to go to the generators (or intermediaries) and ask for reporting tools and the tools don't make that easy.
That being said, see my reply to David Wheeler. That conveys it better.