February 2012 Archives

Seeking someone to host CPANDB::Generate

CPANDB is a pretty awesome tool, in my humble opinion.

It takes a whole variety of different data sets from the CPAN group of services, and cooks them down into a single unified SQLite schema that you can access via a convenient ORLite wrapper (or access directly if you wish).

This single database file can then be used both as a convenience for simple tasks, or to build deep and complex analysis metrics of the kind I used in the creation of the /users/adam_kennedy/2012/02/index.html

The new ORLite 2.0 learns some amazing SQLite tricks

ORLite is a light weight SQLite-specific ORM which is particularly handy for working with ad-hoc SQLite database and creating internal database APIs for large applications, most prominently the database API inside of Padre.

Aligning so closely with the features of a single database engine keeps the implementation size down to a minimum, at less than 1000 lines of code, and allows ORLite to do things that would be completely impossible in more general ORMs.

This is particularly true in the upcoming 2.0 release, a preview of which is available now.

http://svn.ali.as/cpan/releases/ORLite-1.90.tar.gz

This new major revision embraces SQLite's slightly unique rowid mechanism, allowing it to accurately distinguish between different copies of identical data.

Lets start with the following database in the file adam.sqlite

create table person (
    firstname string not null,
    lastname string not null,
    location string not null
);

insert into person values ( 'Adam', 'Kennedy', 'Sydney' );

insert into person values ( 'Adam', 'Kennedy', 'Sydney' );

insert into person values ( 'Adam', 'Kennedy', 'Los Angeles' );

Now lets say I want to move from Sydney to San Francisco. First I'll create an ORM for this database.

package Adam;

use ORLite 'adam.sqlite';

With my ORM created, I'm going to load the two Adams living in Sydney.

my @sydney = Adam::Person->select('where location = ?', 'Sydney');

Now I want to move one and only one of me to San Francisco. Normally this would involve some fairly annoying database contortions to deal with having two identical records.

But SQLite doesn't actually have identical records internally. It's rowid mechanism acts as an implicitly incrementing primary key for any table that does not have one (rowid is an alias for your primary key if you do have one).

In ORLite 2 these rowids are stored internally in the object, and used for all update and delete calls.

So the following will actually work the way it appears to.

$sydney[0]->update( location => 'San Francisco' );

Even though the table has no primary key and two identical records, ORLite has changed one of the two identical records so now we have three Adams in three different places.

Alternatively, we could have resolved the problem by deleting one of the Adams instead.

$sydney[0]->delete;

The two lines of code above actually produce the following SQL statements (I've inlined the placeholder values for demonstration purposes).

UPDATE "person" SET "location" = 'San Francisco' WHERE "rowid" = 1;

DELETE FROM "person" WHERE "rowid" = 1;

Since the rowid system is exposed in SQLite if you need to do funky trickys, I've also done the same thing in ORLite.

All classes that represent table objects now gain a rowid method, which returns an integer if the object is stored in the database or undef if the object is an anonymous object created via new and has not been inserted into the database yet.

The magic of internal rowid tracking not only makes this possible at all, it also makes the code generation significantly easier and execution potentially much much faster by preventing the database compiling queries with O(n) implementations for operations which are only intended to act on a single row.

It also potentially reduces the number of indexes you will need, reducing database size and making insert and update operations faster.

Improved identification of objects also makes it very tempting to move ORLite towards becoming an object-tracking one-instance-only type ORM such as Entity Framework in .NET.

However, I think this is more appropriate for an add-on class or wrapper over ORLite and isn't appropriate in the core module.

However, as a readability nod in that direction, ORLite now checks for a incrementing integer primary key in the recommended pattern tablename_id and automatically adds a convenience id alias for the normal accessor.

So now the following are all now equivalent.

$thing->thing_id;

$thing->id;

# And since rowid reuses incrementing integer keys this is the same as well
$thing->rowid;

I plan to write some kind of cook book or tutorial on recommended database structure before the final 2.0 release.

And finally, ORLite 2.0 will provide much better support for Unicode.

An initial implementation has been provided by means of an explicit unicode => 1 import parameter, but right now care should be taken if your database has any columns of BLOB data type as generate code probably does not take unicode into account properly when working with BLOB values.

About Adam Kennedy

user-pic I blog about Perl.