Test-driving MongoDB on a Moose

I spent a few hours this last weekend trying out MongoDB, and its Perl driver.

I was curious to see how I could put NoSQL databases to good programming use. MongoDB stroke me as a good contender for some experiments: it's easy to install, well documented and has an intuitive query language that reminds me of SQL::Abstract syntax. Besides, MongoDB allegedly is very fast and scalable, which left me wondering how my configuration management apps could benefit from a backend switch from relational to a non-ACID document based DB like MongoDB, just like these people did for their server monitoring software.

One of the things I liked the most about MongoDB is the fact that its Perl driver has a straightforward Moose-based interface. So I decided to take it out for a drive. Actually, my goal wasn't MongoDB's performance, nor its query language features. I was more interested in exploring how could MongoDB could become the base for a Perl object persistence framework out-of-the-box...Out of sheer curiosity and a chance for fun self-learning.

(I know KiokuDB is just sitting out there saying "well, try ME out then". It even has a MongoDB backend prêt-à-porter, but I feared KiokuDB could add some unnecessary overhead, since it caters to a wide range of backends. And, at this point, I just wanted to sketch something up myself to ruminate on.)

So I tried to model a very basic CD-Artist relationship. I came up with a use case:

my $db = MediaDB->new;
# create the artist and store it
my $artist = Artist->new( name => 'Paco de Lucia' );
$db->collection('Artists')->insert( $artist );
# create a CD and store it
my $cd = CD->new( title=>'Almoraima', artist=>$artist );
my $cds = $db->collection('CDs');
$cds->insert( $cd ); 
# now retrieve it back
my $favorite = $cds->find_one({ title=>'Almoraima' });
say $favorite->stringify;

Even though the MongoDB driver relies on Moose (Any::Moose actually) for most of its classes (Database, Collection, OID, etc.), data is represented by an unblessed hashref, which in MongoDB lingo is called a Document. So I figured I needed to wrap some of these objects up to create a proof-of-concept storage framework that would also fit in a blog post.

First I needed a MediaDB class that would make the get_collection method bless collections into my own classes. With a little help from MooseX::Declare, I sketched up a "MediaDB" class:

use MooseX::Declare;
class MediaDB extends MongoDB::Database {
	method BUILDARGS (ClassName $class: @args ) {
		my %args = (@args);
		$args{_connection} ||= MongoDB::Connection->new;
		$args{name} ||= 'mediadb';
		return \%args;
	};
	method collection( $name ) {
		return bless $self->get_collection($name), $name;
	}
}

Now, to be able to store my objects as documents, I needed to serialize them into hashrefs, and vice-versa. Just unblessing them wouldn't take care of its nested objects, so I went shopping around the CPAN. I found this neat module, MooseX::Storage, which I had no idea about. It came quite handy!

So I drafted up a serializing Document role for the MongoDB document common code:

role Document {
	use MooseX::Storage;
	with Storage;
}

That installs a pack and a unpack method for every Document-role consumer.

Now I went on to create the collection role for wrapping the two methods I needed for running my code, so that I could pack and unpack objects on the fly:

role Collection {
	around find_one( HashRef $query, HashRef $fields? ) {
		my $doc = $self->$orig( $query, $fields );
		my $doc_class = $doc->{__CLASS__};
		return $doc_class->unpack( $doc );
	}
	around insert( HashRef $doc, HashRef $options? ) {
		return $doc->{oid} = $self->$orig( $doc->pack );
	}
}

I'm not sure a Collection role is the best fit because I still need to extend MongoDB::Collection for every class I create.

As a nice collateral, when using MooseX::Storage I get MongoDB documents that have a __CLASS__ attribute telling me how to unpack them back after fetching data. It has some drawbacks though, since it ties data and code representation.

Now that I had a very embryonic framework, I threw in my "schema" classes:

class CDs with Collection extends MongoDB::Collection;
class Artists with Collection extends MongoDB::Collection;
class CD with Document {
	has 'title' => ( is=>'rw', isa=>'Str' );
	has 'artist' => ( is=>'rw', isa=>'Artist' );
	method stringify {
		$self->title . ', by ' . $self->artist->name;
	}
}
class Artist with Document {
	has 'name' => ( is=>'rw', isa=>'Str' );
	has 'cds' => ( is=>'rw', isa=>'CDs' );
}

Here, an Artist also has 'cds', an implementation bomb that I'll keep unused for the time being.

After running the use-case, this is how data looked in the DB:

$ mongo 
> use mediadb
switched to db mediadb
> db.CDs.find()
{ "_id" : ObjectId("4bf032abe293830299931162"), "__CLASS__" : "CD", "artist" : { "__CLASS__" : "Artist", "name" : "Paco de Lucia" }, "title" : "Almoraima" }

True. This experiment taps into the untamed waters of object persistance. It left me with a sense of accomplishment: creating a leaner homemade persistence layer is doable with MongoDB -- except I still have many issues to chew over:

  • Relationships: instead of embedding documents into one another, sometimes it's best to store them separately and link the two -- good'ol normalization. But where would be the best place to enforce such "normalization"? How to avoid duplicating objects?
  • Object IDs: the only reliable unique object id in MongoDB seems to be its auto-generated OIDs, which are guaranteed to be unique across replicated servers. How should I handle them in my classes? How about auto-increment ids from my relational models? Could I live without them easy to read, incremental numbers? Should I create UUIDs in my code?
  • Performance: to serialize objects back and forth from the DB is expensive. What would be the most performant way of turning MongoDB documents into Perl objects, and vice-versa, without sacrificing too much? To just carelessly bless them, or to instantiate with Class->new every time?
  • Approaches: what other difficulties and approaches on persisting objects into a document-based database like MongoDB can one come across?

I figure KiokuDB authors have sorted out most of these questions. Maybe next weekend I'll dive into that.

But it does look promising. MongoDB feels very natural in Perl, maybe because the MongoDB team tend to blaze the trail by creating and maintaining their own language driver implementations, something uncanny for most new and established databases out there.

2 Comments

The KiokuDB authors have dealt with many of these issues (as well as the circular reference between author and cds), and strangely the first iteration of the MongoDB Perl Drivers was written by someone who has contributed to KiokuDB (Florian++).

I can't recommend enough taking a look at KiokuDB.

I'd be very interested to see your KiokuDB follow-up... <nudge, nudge>

Leave a comment

About rodrigolive

user-pic Perl and all things considered