Managing SQL Data with Yertl

Every week, I work with about a dozen SQL databases. Some are Sybase, some MySQL, some SQLite. Some have different versions in dev, staging, and production. All of them need data extracted, transformed, and loaded.

DBI is the clear choice for dealing with SQL databases in Perl, but there are a dozen lines of Perl code in between me and the operation that I want. Sure, I've got modules and web applications and ad-hoc commands and scripts that perform certain individual tasks on my databases, but sometimes those things don't quite do what I need right now, and I just want something that will let me execute whatever SQL I can come up with.

Yertl (ETL::Yertl) is a shell-based ETL framework. It's under development (as is all software), but included already is a small utility called ysql to make dealing with SQL databases easy.

Leaving Crumb Trails -- Talking to Myself

The past me is another person. Sometimes antagonist, sometimes friend, past me (postaction?) had ideas, hopes, and dreams and developed some of them into software that I and others use. Unfortunately, that asshole left bugs all through the code for me to fix.

I can't blame him. Nobody's perfect, not even idealized/demonized copies of my past self. But I do have to fix them, and deal with the messes he left.

Lucky for me, while he was writing buggy software, he left extensive notes for me to use...

Managing Boilerplate with Import::Base

Boilerplate is everything I hate about programming:

  • Doing the same thing more than once
  • Leaving clutter in every file
  • Making it harder to change things in the future
  • Eventually blindly copying without understanding (cargo-cult programming)

In an effort to reduce some of my boilerplate, I wrote Import::Base a module to collect and import useful bundles of modules, removing the need for long lists of use ... lines everywhere.

Between Learning and Doing

ive-made-a-huge-mistake.gif

A long time ago, when I started building my first video game server for Double Cluepon, my video game company, I did a bad thing. I looked at the AMF library for Perl and Python and decided that Python's looked better. I had always meant to learn Python, and this felt like the perfect opportunity. It had cooperative multitasking (Twisted) and it had an ORM (SQLAlchemy), so along with the messaging format (PyAMF), I had everything I needed to build a server for a Flash MMO (later migrated to AIR).

Let me reiterate my mistake: While under time constraints, I chose to learn a new programming language. I didn't realize my mistake until it was too late.

perlsloc - Count Perl Source Lines with Perl::Tidy

While spending some time putting together my own perltidyrc file, I became intimately familiar with the Perl::Tidy documentation.

One day, I decided to find out exactly how much code I was maintaining. Since perltidy can strip comments and POD, and also normalize the source code to make a fair measurement, it's a perfect tool for counting Source Lines of Code (SLOC).

Here's a small shell script using ack, perltidy, xargs, and wc to count the source lines of code in any number of directories.

ack -f --perl $@ | xargs -L 1 perltidy --noprofile --delete-pod --mbl=0 --standard-output | wc -l

ack -f lists the files that would be searched, and --perl searches Perl files, so we get ack's heuristics for finding Perl files. xargs -L 1 invokes the following command for every 1 line of input. The perltidy command strips the pod and tightens up the whitespace and writes the result to stdout, which wc -l will then count, line by line.

So, as an example, the current Statocles release has 50% more test lines than source lines:

» perlsloc lib bin
    1034
» perlsloc t
    1633