Removing database abstraction

While working on the database abstraction layer for one of my toy projects, I came to a sudden realization. Why bother? I know SQL. I think in SQL. I thoroughly enjoy the relational model and DBI does pretty much everything I need. I can tie myself to one database system, since it's just a toy project. Sure, there are some repetitive bits that I don't want to repeat all throughout the codebase, but I can abstract those away once and for all. It sounds like an enormous reduction of friction. Am I missing something?

10 Comments

Do what's right for you and right for your project. Sometimes I use DBIx::Class, sometimes I just hand code the SQL direct into DBI. It all just depends. There's no one right way. Every project has different requirements and different needs.

And isn't SQL itself an abstraction? :)

I also mostly use SQL, but sometimes I do get tired of its verboseness and lack of flexibility.

The question pops up regularly. The 2006 discussion in Catalyst list collected a number of interesting points; see http://lists.scsys.co.uk/pipermail/catalyst/2006-June/008059.html

A discussion I have with colleagues from time to time. SQL has been shown to be a productive language in its own domain (Software Productivity Research group) so why go to so much effort to OO-ise the data model?

Each to their own I guess.

One thing to consider is that a good ORM lets you not have to worry about embedding another language in your own. Instead, you're embedding your language into your language and your compiler can do the syntax checking for you, often at compile-time for some constructs. Otherwise, you have huge messes of nasty SQL in heredocs which might or might not be correct, but how would you know?

Another argument I've heard is that the ORM lets you abstract away the database differences, but only once in my career have I seen a project ported from one DB to another, and yes, the ORM helped quite a bit.

If you *must* dump the ORM, take a look at Fey. It's nicely written and will let you write SQL in Perl (so to speak). It also has great documentation explaining the rationale behind using Fey instead of creating SQL strings.

Failing all of that, just make sure your SQL is well-encapsulated in one well-tested area of your code where it's easy to replace, if need be.

Finally! Someone else who understands!


IMHO, regarding this topic, there are 2 groups of programmers: those who hate and/or fear databases, and those that understand them. The former are clearly in the majority today, and work to hide away database access inside systems like ORMs so they can pretend that the database doesn't really exist, or exists as some sort of magic that only crazy people would want to deal with. Usually this attitude results in the database being neglected and eventually it bites the hand that is inadequately feeding it (which then perpetuates the opinion that databases should be hidden away and only crazy people want to work with them).


Anyway, getting back to the matter at hand. Don't let my above mini-rant fool you: ORMs and other types of database encapsulation can be a great thing if they don't get in the way and are used appropriately. I acknowledge the need for them. I do not, however, believe ORMs are appropriate in all circumstances. Database-centric applications can often become significantly harder when the database access routines are too abstracted. If you're working on the type of application where most of the data manipulation can easily be done in SQL, trying to separate your code from the database with an ORM layer will only lead to needless complications. If you are a database programmer, you can think in SQL and know what bits of your project can be easily done in SQL and what bits are more easily done in Perl (or whatever other language). Decide from that whether your project needs an ORM or some other database abstraction.

Don't get trapped in the mindset of "I have to use X because if I don't, I'm doing it wrong". Quite often, if you don't use X, it's entirely too easy to do it wrong if you don't know what you're doing. You probably don't want to re-implement CGI parameter parsing, for example. But that's not the same thing as saying that you should always use CGI because it's a solved problem so never do anything else. Nothing is a solved problem. mod_perl is a valid contender to CGI and now Plack is a valid contender to mod_perl. FastCGI was and is a valid contender to mod_perl and servers like nginx do awesome thing. Yet, tirelessly, the fans of one explain that the competing ideas are somehow not valid.

Sorry, I'm trying to do proof by analogy here. It isn't valid except to the degree that it is. I'll get to databases in a minute.

Quick recap: there are lots of valid approaches; using an alternative is not the same as re-inventing the wheel.

Furthermore, the heaviest technology is seldom the long term winner. Witness the return to lighter HTTP pipelines. For ages, Apache boasted being a bit faster than IIS, in response to which I could only wonder why Apache was so slow.

Okay, back to databases. DBIx::Class to a relational database is a valid option. It's also very heavy. It alo doesn't really let you scale your web app out unless the database in question is DB2, Oracle, or one of a few of those that runs on a cluster with a lot of processors rather than just one computer. Otherwise you've just added a new bottleneck. DBIx::Class makes it harder to do real relational work -- subqueries, having, or anything hairy. At the very least, you have to create a class file with a lot of boilerplate, reference that file from other files that you made or generated, and stuff the same SQL into there. Abstracting the querying away in simple cases makes it easier to query the database without thinking about it. This leads you to query the database without thinking about it. That's a double edged sword. In some cases, that's fantastic.

Lego blocks make it easy to build things but you seldom buy home appliances built out of Legos. Even more so for Duplo blocks. Some times easy tools are in order; some times, low level engineering with an RPN HP calculator is absolutely in order.

Okay, I'll get back to databases in a minute here, but I want to talk about something outrageous for a moment -- not using a relational database at all.

I wrote and use Acme::State for simple command line and daemonized Web apps. It works well with Continuity and should work with the various Coro based Plack servers for the reason that the application stays entirely in one process. All it does is restore state of variables on startup and save them on exit or when explicitly requested. It kind of goes overboard on restoring state and does a good enough job that it breaks lots of modules if not confined to your own namespace, hence the Acme:: designation.

Similarly, people have used Data::Dumper or Storable directly (Acme::State uses Storable under the hood) to serialize datastructures on startup and exit. In AnyEvent frameworks, it's easy to set a timer that, on expiration, saves a snapshot. Marc Lehmann, the man who created the excellent Coro module, has patches to Storable to make it reenterant and incremental, so that the process (which might also be servicing network requests for some protocol) doesn't get starved for CPU while a large dump is made. His Perl/Coro based multiplayer RPG is based on this idea. With hundreds of users issuing commands a few times a second, this is the only realistic option. If you tried to create this level of performance with a database, you'd find that you had to have the entire working set in RAM not once but several times over in copies in parallel database slaves. That's silly.

You can be very high tech and not use a database. If you're not actually using the relational capabilities (normalized tables, joining tables, filtering and aggregating, etc), then a relational database is a slow and verbose to use (even with DBIx::Class) replacement for dbmopen() (perldoc -f dbmopen, but use the module instead). You're not gaining performance, elegance or scalability, in most cases. People use databases automatically and mindlessly now days to the point where they feel they have to, and by virtue of having to use a database, they have to ease the pain with an ORM.

Anytime someone says "always", be skeptical. You're probably talking to someone who doesn't understand or doesn't care about the tradeoffs involved.

Okay, back to databases. Right now, it's trendy to create domain specific languages. The Ruby guys are doing it and the Perl guys have been doing it for ages. Forth was created around the idea -- the Forth parser is written in Forth and is extensible. Perl 5.12 lets you hijack syntax parsing with Perl in a very Forth-ish style. Devel::Declare was used to create proper argument lists for functions inside of Perl. There's a MooseX for it and a standalone one, Method::Signatures. That idea got moved into core. XS::APItest::KeywordRPN is a demo. Besides that, regular expressions and POD are two other entire syntaxes that exist in .pl and .pm files. It's hypocritical to say that it's somehow "bad" to mix languages. It's true that you don't want your front end graphic designer editing your .pl files if he/she doesn't know Perl. If your designer does know Perl, and the code is small and doesn't need to be factored apart yet, what's the harm? It's possible to write extremely expressive code mixing SQL and Perl. Lots of people have written a lot of little wrappers. Here's one I sometimes use:

http://slowass.net/~scott/tmp/query.pm.txt

Finally, part of Ruby's appeal -- or any new language's appeal -- is lightness. It's easy to keep adding cruft, indirection, and abstraction and not realize that you're slowly boiling yourself to death in it until you go to a new language and get away from it for a while. The Ruby guys, like the Python guys before them, have done a good job of building up simple but versatile APIs that combine well with each other and keep the language in charge rather than any monolithic "framework". Well, except for Rails, but maybe that was a counter example that motivated better behavior. Look at Scrapi (and Web::Scraper in Perl) for an example.

Too much abstraction not only makes your code slow but it makes it hard to change development direction in the future when something cooler, faster, lighter and more flexible comes out. Just as the whole Perl universe spent ten years mired down in and entrenched in mod_perl, so is there a danger that DBIx::Class and Moose will outlive their welcome. POE, which was once a golden child, has already outlived its welcome as Coro and AnyEvent stuff has taken off. Now there are a lot of Perl programs broken up into tiny non-blocking chunks that are five times as long as they could be, and the effort to move them away from POE is just too great. The utility of a package should be weighed against the commitment you have to make to it. The commitment you have to make to it is simply how much you have to alter your programming style. With Moose as with POE, this degree is huge. DBIx::Class is more reasonable. Still, it's a cost, and things have costs.

Thank you for your article.

Regards,
-scott

First:

No matter what you do, if you send SQL to the database yourself, do yourself the favour and use DBIx::Simple, which is essentially a sanity wrapper around the annoying DBI API: there is only a query method instead of execute, prepare and do (plus variations), which returns an object analogous to a statement handle that has sensibly named methods (Hash, Hashes, Array, Arrays etc) instead of DBI’s terrible mess ((select|fetch)(row|all)_(hashes|arrays) plus passing a Slice attribute to select whether you want arrays or hashes for each row).

Switching to DBIx::Simple made my code much more skimmable than it was when I used DBI directly.

Second:

The reason you want some sort of abstraction layer is very simple: SQL itself has absolutely no abstraction facilities!

In practical terms, that means if you need to write a query that computes the same result set as another (ie. it will perform the same set of joins, use the same set of conditions, etc.), but then has to return f.ex. a different kind of aggregate, or you want to use that query as a subquery to build another more complex query – then your only option is to copypaste your existing query and tweak it.

Joy.

However, I believe classical ORMs are the wrong answer to this. I think Fey::SQL is much closer: you use an object model for actual SQL queries (rather than some kind of more abstract API with its own concepts that map to SQL somehow), and you can then pass around queries, tables, columns and table- and column-aliases symbolically (as objects) instead of trying to build abstractions on top of SQL string manipulation.

You’re essentially still writing SQL, you’re just doing it in a way that facilitates abstraction of common code.

An option that works quite well for me is for every thing you do with the database, wrap that in it's own function. You can even create your own module and consequently namespace, which contains the database functions. That way, all of your SQL and little else, is in a single file. If later on you decide to replace the DBI calls with something more oo or to use a different backend, you probably won't need to touch the code except in that one module.

Leave a comment

About cbt

user-pic I like to write computer programs, for both fun and profit. I enjoy Perl the most, but plenty of other languages are good too.