Matt Trout's Data::Query coming soon?

By Ovid on October 12, 2012 11:38 AM

Sitting here in the Italian Perl Workshop and have been enjoying the talks and the excellent food. I was thinking about Matt Trout's Data::Query talk from yesterday. In a nutshell, Data::Query lets you write things like this:

SELECT { $_->cd->name }
  FROM { $_->cds, AS 'cd' }
  JOIN { $_->artists, AS 'artist' }
    ON { $_->cd->artistid eq $_->artist->id }
 WHERE { $_->artist->age > 25 }

This is very exciting, though it might not be immediately evident why.

The code shown should generate a data structure similar to this:

{
    'from' => {
        'from' => {
            'left' => {
                'from' => {
                    'elements' => [ 'cds' ],
                    'type'     => 'Identifier'
                },
                'to'   => 'cd',
                'type' => 'Alias'
            },
            'on' => {
                'args' => [
                    {   'elements' => [
                            'cd',
                            'artistid'
                        ],
                        'type' => 'Identifier'
                    },
                    {   'elements' => [
                            'artist',
                            'id'
                        ],
                        'type' => 'Identifier'
                    }
                ],
                'operator' => { 'Perl' => 'eq' },
                'type'     => 'Operator'
            },
            # snip

I've omitted most of the output for brevity.

You can then use a renderer to render that as SQL. What does this gain us over SQL::Abstract? Plenty! SQL::Abstract renders Perl code directly to SQL with no abstract syntax tree (AST) presentation (though it has a module for playing around with trees). What Data::Query offers is the phenomenal ability to play around with that tree before you convert it to SQL (or some other query language!). Why would you want to do this? Take a look at this bit from the above:

{   'elements' => [
        'artist',
        'id'
    ],
    'type' => 'Identifier'
}

Note that we have both a name and a primitive type (expanding type declarations would be good here). Those give us some interesting possibilities we didn't have before.

But first, let me daydream about some things I would like to see in SQL. Consider the following:

SELECT name FROM customer WHERE age > id;

This is completely non-sensical SQL that neither databases nor ORMs forbid. It's a type error. Just because the customer age and customer ID are both integers does not mean they should have an operator to compare them. In fact, we might argue that the only allowed SQL operators for an ID should be = or IN, but only compared against their own type (for example, a foreign key in another table).

Here's another interesting example:

SELECT c.name
  FROM customer c
  JOIN orders o ON o.id = c.id
 WHERE c.id = 7

Here if we declared that = can only compare against its own type, that should fail on the o.id = c.id. In other words, logic errors could (in theory) throw exceptions at compile time, except that SQL doesn't enforce this. However, what about the c.id = 7 case? That might take some work. If we claimed that 'ID' is a subtype of Integer, perhaps we could directly allow comparisons against constants (or bind params) whose type matches the type or subtype. Or perhaps we could have a casting syntax (something that Postgres allows)?

By defining types and the operators for them (conceptually similar to declaring a class and its methods), we could catch all sorts of errors in our SQL, but you poor MySQL sufferers will never get it.

... unless it's pushed up into a middleware layer between something like Data::Query and a renderer. The AST could be walked by a validator to ensure that, for example, a Fahrenheit subtype of Float can never be compared directly against a Celsius subtype of Float. You could cast them both as their parent type of Float and generate a logic error that way, but the programmer would have to try to subvert the type system but the type system itself is strict by default.

All sorts of delightful type safety are now in our reach when Data::Query is released. Matt Trout already has a branch of DBIx::Class running on top of Data::Query and this is tremendously exciting to me. You could potentially derive much of your type data from DBIx::Class metadata and if you've been careful, have "type safety" (cough) be a side effect of using an ORM!

Also note that Data::Query is not tied to SQL. Thus, we could create composable ORM queries which would allow you to fetch a bunch of data from the database but still filter on data that doesn't exist in the db.

This code is not yet on the CPAN. To play around with it, you'll want to run this:

git clone git://git.shadowcat.co.uk/dbsrgits/Data-Query.git

Running the tests requires this (the current directory is required in @INC):

prove -l -I. t

You'll have to go through and install various modules you might be missing.

4 comments

Tagged as:

data::query, databases, DBIx::Class, perl

4 Comments

Toby Inkster | October 12, 2012 3:14 PM

Your "things I would like to see in SQL" are already possible in SQL if your database provides a UUID datatype. The query where id is compared to age will fail if id is a UUID and age is an integer. The query where two unrelated id columns are joined will return no results because UUIDs are unique.

Ovid replied to comment from Toby Inkster | October 12, 2012 3:36 PM

Toby: first, that doesn't help the Fahrenheit = Celsius example (which I put in there to make it clear that this was more than just ID comparison). Second, IDs should generally be as short as possible because they're indexed and size matters. See this link for more information. For a single UUID and a smallish database it might not matter so much. For a larger database, particularly when you have a lookup table indexed across several fields, you could have a dramatic performance impact. Obviously, your mileage will vary.

http://tendentious.org/author/mdorman/ | October 12, 2012 4:26 PM

It seems to me what you're asking for a more sophisticated type system for SQL that would help catch domain errors.

The sort of CREATE TYPE facility that PostgreSQL has would let you do what you want---at the cost of complexity---and would even allow you to define conversion functions so that you could compare a value of type "Fahrenheit" to a value of type "Celsius" and get a correct answer.

I thought that the CREATE DOMAIN capability was a good fit for simple cases where you just wanted to get warned if you had a type mismatch, but no coercion and so forth---but I was disappointed to find that while it lets you create a type synonym with additional constraints, it will still do comparisons and such based on the underlying type.

It's disappointing that this is the case, because it would likely be a very lightweight way to get the sort of thing you're talking about.

channarith | November 11, 2013 3:43 PM

How renders Perl code directly to SQL ? . Thanks in advance for responding .

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid