Managing SQL Data with Yertl
Every week, I work with about a dozen SQL databases. Some are Sybase, some MySQL, some SQLite. Some have different versions in dev, staging, and production. All of them need data extracted, transformed, and loaded.
DBI is the clear choice for dealing with SQL databases in Perl, but there are a dozen lines of Perl code in between me and the operation that I want. Sure, I've got modules and web applications and ad-hoc commands and scripts that perform certain individual tasks on my databases, but sometimes those things don't quite do what I need right now, and I just want something that will let me execute whatever SQL I can come up with.
Yertl (ETL::Yertl) is a shell-based ETL framework. It's under development (as is all software), but included already is a small utility called ysql to make dealing with SQL databases easy.
To use ysql, first we have to configure a database. This saves us from having
to type the full DBI data source name (dbi:mysql:host=dev;database=mydb
) every
time. Instead, we can refer to our database by a nice name, like "dev", or
"prod".
$ ysql --config dev dbi:SQLite:database.db
Later, we can update our configuration if we need to:
$ ysql --config dev --database=dev.db
We can examine our configuration as a YAML document:
$ ysql --config dev
---
database: dev.db
driver: SQLite
Let's add a production database as well:
$ ysql --config prod --driver=SQLite --database=prod.db
And now we can check both of our configs:
$ ysql --config
---
dev:
database: dev.db
driver: SQLite
prod:
database: prod.db
driver: SQLite
Now that we've configured some databases, let's insert some data. First we need to make some tables:
$ ysql prod 'CREATE TABLE users ( id INTEGER PRIMARY KEY \
AUTOINCREMENT, name VARCHAR, email VARCHAR )'
$ ysql dev 'CREATE TABLE users ( id INTEGER PRIMARY KEY \
AUTOINCREMENT, name VARCHAR, email VARCHAR )'
Next let's insert some data:
$ ysql prod 'INSERT INTO users ( name, email ) \
VALUES ( "preaction", "preaction@example.com" )'
$ ysql prod 'INSERT INTO users ( name, email ) \
VALUES ( "postaction", "postaction@example.com" )'
Now, let's query for our data:
$ ysql prod 'SELECT * FROM users'
---
email: preaction@example.com
id: 1
name: preaction
---
email: postaction.example.com
id: 2
name: postaction
Yertl uses YAML as its default output, but we can easily convert to JSON or CSV using the yto utility
$ ysql prod 'SELECT * FROM users' | yto csv
email,id,name
preaction@example.com,1,preaction
postaction@example.com,2,postaction
$ ysql prod 'SELECT * FROM users' | yto json
{
"email" : "preaction@example.com",
"id" : "1",
"name" : "preaction"
}
{
"email" : "postaction@example.com",
"id" : "2",
"name" : "postaction"
}
Now, lets say we want to copy our production database to dev for testing. To do
that, Yertl allows us to read YAML from STDIN
and execute a query for each YAML
document. Yertl uses a special interpolation syntax (starting with a $
) to
pick parts of the document to fill in the query:
$ ysql prod 'SELECT * FROM users' |
ysql dev 'INSERT INTO users ( id, name, email ) \
VALUES ( $.id, $.name, $.email )'
So this will take our users table from prod and write it to dev. $.id
picks
the "id" field, $.name
the "name" field, etc...
But all this would be a bear to type over and over again (imagine if we had a
bunch of joins to do). So, ysql allows you to save queries for later use using
the --save <name>
option:
$ ysql prod --save users 'SELECT * FROM users'
$ ysql dev --save update_users 'UPDATE users SET \
name=$.name, email=$.email WHERE id=$.id'
Then we can recall our query by the name we gave to the --save
option:
$ ysql prod users | ysql dev update_users
Finally, using yto
and yfrom, we can write a
dump of our users to JSON, and then read that database dump back into our
database:
$ ysql prod users | yto json > users.json
$ yfrom json < users.json | ysql dev update_users
So, though Yertl is in its infancy, it can already help with some common database tasks!
There are lots of plans for Yertl, described in the feature's tag on the issue tracker, so if you've got common data tasks that you feel should be easier, join me in #yertl on irc.perl.org.
Leave a comment