Veure's Database
I recently wrote about Veure's test suite and today I'll write a bit about how we manage our database. Sadly, this will be a long post because it's a complicated problem and there's a lot to discuss.
When I first started Veure, I used SQLite to prototype, but it's so incredibly limited that I quickly switched to Postgres. It's been a critically important decision, but I want to take a moment to explain why.
All software effectively has four "phases" which amount to:
- Initialization
- Input
- Calculation
- Output
Note that we could rewrite the above as:
- Initialization of data
- Input of data
- Calculation of data
- Output of data
Notice a pattern?
Yeah, I thought so. There are all sorts of areas where we could get things wrong in software, but the further down the stack(s) you go, the more care you need to take because the more damaging bugs can be. Data storage is often pretty low in your stack and you don't want to get this wrong. So what happens?
Choosing a Database
Devs, when starting a new project, usually choose the programming languages they know and are comfortable with. The same goes for their choice of database. That usually means that MySQL is chosen with little thought and when it gets slow, devs tune a few parameters. For some reason, I often get hired by a client around the time they realize that select @@SQL_MODE
isn't returning anything and they're getting illegal dates, empty strings where they should have one of a number of enum values, data truncation when they try to shove 60 characters into a 50 character field, and so on (I've even been hired to work on MyISAM OLTP systems!). At this point, the DBA (if the company has one) tries to enable strict or traditional mode in MySQL, but finds out that the software is dependent on the broken behavior, so instead of fixing the database, workarounds get pushed into the code layer. Now they have two layers of bugs, all stemming from the same problem, and it keeps a lot of consultants in business.
And after a while, you invariably get devs who say "we should have used Postgres" but they get shut down (often quite hard) by management who point out that it's too expensive and risky to switch to Postgres at that point, particularly when you start looking at opportunity costs. And you know what? Management is often right to say this. Companies don't switch databases for damned good reasons. If MySQL mostly works, it mostly stays.
Our Database
Veure, on the other hand, was partially built to have software that devs would actually want to work on. In the words of a dev we recently put on the project: "I don't know what to do; I get hired to clean up cruft, but I can't find any here." (Trust me, though, I can find plenty. Just yesterday I found a new controller method with 40 lines of code! We need to fix that.) As a result, as much as possible, I wanted a database that wouldn't accept bad data and would just do the right thing. For example, your character has strength
, but it also has curr_strength
, which is a percentage
. It's defined in our database as:
CREATE DOMAIN percentage AS real
CONSTRAINT percentage_check CHECK (((VALUE >= 0.0) AND (VALUE <= 1.0)));
Even if we forget to check in our code layer if the curr_strength
is greater than 1, our database isn't going to allow garbage into it. Being able to seamlessly define your own domains and types in PostgreSQL is one of the many reasons we've chosen it.
Another is data migration. Initially I used a fork of DBIx::Migration (thanks, in part, to this bug) which created migration files like 3_up_embassies.sql
and 3_down_embassies.sql
, but for various reasons, it was unsatisfactory and today we've sqitched to vsqitch
.
What is vsqitch
, you might ask? Well, that's work we've built on top of sqitch (in fact, I notice lots of people building on top of sqitch rather than using it out of the box). Like DBIx::Migration
, it's very easy to migrate your database and up and down, but if you want to trust that, you don't want MySQL because DDL is not transaction safe in MySQL. I can't tell you now many times I've had an "up" migration fail in MySQL and I've had to go in and manually revert any alter/create/drop work. That's because in MySQL, all DDL implicitly commits the current transaction. Yeah, sucks to be you and have fun with the workarounds. For one of our clients, I built on top of sqitch to allow pt-online-schema-change (an awesome Percona tool) to be used with MySQL to allow some DDL to be used in transactions.
So yeah, we use Postgres. It gives us everything we need out of a database, and we can trust our database migrations to work (though we still have tests for our sqitch changes).
But that still doesn't answer the vsqitch
question. That deals with how we handle data.
Data Loader
In applications, there tends to be two types of data in the database: application data and user data. Application data is the data the application must have to run. For Veure, amongst other things, these are stars, space stations, wormholes, missions, and so on. A fresh instance of Veure has to have that data or it doesn't run.
However, the user data would be a player signing up, creating a character name, the money they get paid for completing a mission, and so on. This data is highly volatile and the application will run just fine without it.
Veure is an MMORPG and it has lots and lots of application data. The traditional sqitch model is to manage both your database and its data with SQL, so we were writing sqitch migrations with hundreds (sometimes thousands) of lines of SQL and it wasn't always clear exactly what was being changed. Writing the revert
SQL sucked, too. And we've pretty much given up on writing the verify
SQL (that's what our test suite is for!).
So what I often found myself doing was writing software to write SQL to append to Sqitch migrations and frankly, that sucked. Our our narrative designers would change the names of several areas on stations and we'd either have to write the SQL to reflect their changes or ...
Screw that. I wound up writing a data loader system that takes advantage of a few things. First, most of our application data has an identifying "slug". This means that if someone wants to reference Epsilon Eridani (a star a little over 10 light years from Earth), we can reference epsilon-eridani
instead of star id 14. Our data loader uses the (poorly documented) metadata facilities for DBIx::Class
, spiders through the result classes and for each class that returns true for the include_in_data_loader()
method, it gets added to our queue. We then sort the classes by dependency and write the data out into JSON files, using slugs instead of IDs.
We can both load data and write data with this system. I can hack around, change things in the database via the command line, or maybe change them in the admin screens, run a quick "write_data", commit that, and for many of our rapid changes, it's trivial: no more SQL.
Except that it didn't work. You see, at any given time, the data in our data/json
directories couldn't be guaranteed to synchronize with the database changes in our sqitch
directory and this quickly made it very hard to migrate the database up and down and still have valid data.
So now we have vsqitch
. Without going into detail, we can now run this:
vsqitch --sync-table=character,character_item add character/sync_data
And that will take the character
and character_item
table data and sync that into a character/sync_data
sqitch migration (in this case, there's a special case which only does this for NPCs because we don't want your PC data to be synced like this).
Or if we've done lots and lots of changes, we could use the brute force:
vsqitch --sync-all-tables add sync_all_data
Frankly, I'd love to open source this system, but it's fairly tightly coupled to our work :/
After that, if we have DDL changes, we use DBIx::Class::Schema::Loader to rebuild our schema classes and get on with our work. As a lovely side effect, this can even expose design flaws in our database.
Testing
So how do manage that database in our tests? We use Test::Class::Moose for almost all of our tests and our base class has this:
sub test_startup {
my $test = shift;
$test->veure->schema->txn_begin;
$test->next::method;
}
So before every test class, we start a transaction. And before every test method:
sub test_setup {
my $test = shift;
$test->veure->schema->svp_begin("test_setup_$$");
$test->next::method;
}
There we start a savepoint. Our test_teardown
releases that savepoint and our test_shutdown
rolls back the transaction.
So in our test suite, we're free to alter the database as much as we want, never, ever worrying about data corruption.
But what about transactions in our code? Doesn't that cause an issue if I've already started one? Nope! DBIx::Class has an auto_savepoint option which, if true, means that if you start a transaction and it encounters another one, it will use a savepoint instead of forcing you to finish the previous transaction. Win!
And for more information about how we load data in our tests, here's a post where I touch on the subject.
Speaking of Test Suites ...
Here was our test suite output when I wrote about it a month ago:
$ prove -l t
t/001sanity.t ... ok
t/perlcritic.t .. ok
t/sqitch.t ...... ok
t/tcm.t ......... ok
All tests successful.
Files=4, Tests=740, 654 wallclock secs ( 1.57 usr 0.20 sys + 742.40 cusr 15.79 csys = 759.96 CPU)
Result: PASS
And here's that output today:
$ prove -rl t
t/001sanity.t ... ok
t/perlcritic.t .. ok
t/sqitch.t ...... ok
t/tcm.t ......... ok
All tests successful.
Files=4, Tests=838, 414 wallclock secs ( 1.10 usr 0.12 sys + 548.98 cusr 10.46 csys = 560.66 CPU)
Result: PASS
More tests and considerably faster. And our Perl::Critic
tests now pass level 4 instead of level 5.
Nice article, although some of it went over my head. Always good to see progress being made on Veure.
Awesome post!!