Custom Hacks and Comfort Levels

By Ovid on February 26, 2010 10:00 PM

As I'm working on my Veure project, I've used DBIx::Migration to handle database migrations. If you're familiar with this module, this might seem like a surprising choice since it is, well, awful. However, I know exactly what the limitations are and since I'm so familiar with it, I now have a custom Veure::DBIx::Migration. There are several things I find useful about this.

It's not tied into a specific ORM.
It's very simple.
It's easy to fix and make PostgreSQL-specific hacks I need.

Actually, that's almost a lie.

It's about comfort level. I used this for two years with the first BBC team I was on. I know it's failings and can thus easily correct for them. This is not a strategy I recommend, but this module is easy to replace and frankly, when faced with tons of work, sometimes the "comfort zone" is OK when it's low-risk. I didn't want to spend time evaluating a bunch of choices when I know this one can do what I need. With some work.

First, it doesn't handle database migration levels with more than one digit. That's easily fixed by the patch at that bug report.

Second, you don't get descriptive database migration names reported, so I hacked it to allow:

schema/migrations/5_up_setup_admin_tables.sql
schema/migrations/5_down_setup_admin_tables.sql

It also doesn't have terribly descriptive information entered in the test database, but honestly, I don't particularly care about that. If I need it, I'll add it or replace this module

It also has code which looks similar to this (this is my current version):

for my $sql ( split /;/, $text ) {
    next unless $sql =~ /\w/;
    print "$sql\n" if $self->debug;
    $self->do_sql($sql);
}

The do_sql was actually a $dbh->do, but I put that in one spot so I can add some PostgreSQL specific magic. Specifically, I want to parse the SQL and if I'm going to drop a table, I can query the PostgreSQL metadata, find the sequences and reset them to '1', thus ensuring that my up and down migration values are always the same.

The real beauty of a local hack, though, is this:

for my $sql ( split /;/, $text ) {

See a problem with that? What happens when I want to add a stored procedure? A stored procedure might have embedded semicolons. They look sort of like this:

CREATE FUNCTION getQtyOrders(customerID int) RETURNS int AS $$
DECLARE
    qty int;
BEGIN
    SELECT COUNT(*) INTO qty
    FROM Orders
    WHERE accnum = customerID;
     RETURN qty;
END;
$$ LANGUAGE plpgsql;

Those embedded semicolons kill me. So now I just switch to my Data::Record module (think "split on steroids"):

use Regexp::Common;
# later
my $split = Data::Record->new({
    split  => ";",
    unless => qr/\$\$.*?\$\$|$RE{quoted}/s,
});
for my $sql ($split->records($text)) {
    next unless $sql =~ /\w/;
    print "$sql\n" if $self->debug;
    $self->do_sql($sql);
}

Now I can embed stored procedures in my SQL files and, as long as I follow the '$$' delimiter convention, everything magically works (including quoted semicolons). This is not a general purpose solution, however, which is why I've not filed patches (the previous ones aren't getting applied anyway). As a result, this isn't getting released, but I thought I'd share this with you. Local forks should generally be avoided. However, if the code is ridiculously simple (and the code I've forked is) and you're willing to accept that you maintain it yourself, it's not always bad to fork a module and keep it. What you lose in community support you gain in knowing that if it screws up, IT'S YOUR FAULT. Thus, you have no one else to blame.

13 comments

Tagged as:

databases, postgresql

13 Comments

David E. Wheeler | February 27, 2010 3:22 AM | Reply

I really don't understand why no one uses SQL for migrations. Then you don't have to worry about parsing SQL, because the server will parse it. You know. Like it's designed to do.

—Theory

Steven Haryanto | February 27, 2010 3:42 AM | Reply

@David: When you said 'no one' I assume no one in Perlland right. Because practically every PHP webapp I've encountered uses plain SQL for upgrades.

But then it's the dealing with different SQL dialects problem all over again.

morungos | February 27, 2010 5:36 AM | Reply

I use a mocked-up C-style preprocessor for SQL. I add a few #ifdef style rules for conditional SQL. It allows for dialects, if you can live with the grottiness of the C-type syntax. Perhaps best, we have a trap for fake SQL statements that do bulk inserts - this gave us a great performance boost and was generally seamless.

Aristotle | February 27, 2010 6:05 AM | Reply

I’m looking at DBIx::VersionedDDL, right now. Anyone have any comment about it? (Even people who haven’t used it, if they’ve used other migration modules.)

Ovid | February 27, 2010 8:26 AM | Reply

@Aristotle: that module is functionally equivalent to DBIx::Migration. It has marginally more information in the database table where it stores schema information. That's about it. SQL statements must be separated by semicolons, but it's a wee more intelligent because it uses Text::CSV to split the data. However, it has some issues.

The default escape character in Text::CSV is a double quote mark, the same as the quote character. This is configurable in some databases and might conflict. There is no clean way to override this in the module because this is shoved in the _run method, which brings me to the next problem.

There is a series of dodgy substitutions in that method. They'll be alright for most cases, but not all. In fact, there's one which compresses all multiple whitespace to a single space. Even if it's in quotes.

This is a new module and, if anything, just needs to be subclassable with appropriate hooks for how to split the sql, pre-processing, executing, and post-processing, etc.

And, um, er, hey ... that's a new module for someone to write :)

Ovid | February 27, 2010 8:37 AM | Reply

@David: can you explain more? These are text files which have multiple SQL statements. I know I'm just being dense, but how would you approach this?

Ben Morrow | February 27, 2010 3:00 PM | Reply

Don't try to parse the SQL in Perl, just feed it to psql. IIRC DBD::Pg will let you feed lots of ;-separated statements to $dbh->do, too, so there's really no need to mess around splitting them up (wrong).

Ovid | February 27, 2010 3:24 PM | Reply

@morrow: ah, ah, ah! I'm so used to MySQL's "I'm too stupid to figure out more than one statement at time" behavior and I've been using this for so long that I just forgot that other databases aren't so brain dead.

I just made the change and it works like a charm :)

(This means I'll have to rethink my sequence strategy, though)

Aristotle | February 28, 2010 7:47 AM | Reply

Hmm, I regularly feed my SQL scripts containing multiple queries to the command line mysql client. I guess it’s the server that’s too dumb? But does mysql parse the script then? Odd.

David E. Wheeler replied to comment from Ovid | March 1, 2010 9:29 PM | Reply

I use psql to run migration scripts. Perl does nothing, no parsing required.

Nice that DBD::Pg's do() allows multiple statements to be passed. Not all DBDs allow this, though (I'm looking at you, DBD::SQLite!).

—David

David E. Wheeler | March 1, 2010 9:30 PM | Reply

@Aristotle—I suspect it has to do with the way DBDs are written, because I have C code that passes a huge chunk of SQL queries to SQLite and it does the right thing, but DBD::SQLite's do() does not.

—Theory

George Hartzell | March 22, 2010 6:09 PM | Reply

Thanks for the summary Ovid.

Have you looked at migraine?

Ovid | March 22, 2010 6:43 PM | Reply

@George: Haven't looked at it, no. Now that my current system is up and running, it seems to be fine.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid