December 2010 Archives

Morpheus - ultimate configuration engine

As I promised, here are the slides from my talk this morning at Saint Perl-2 in Saint-Petersburg, Russia.

I believe Morpheus can be very useful for the community and hope that it'll become widely adopted.
There are still a lot of things which can be added, but conceptually we are on the right track.

These slides probably suck (I wrote them in the last moment and didn't put enough details in some places), but putting code out in the wild and getting feedback is more important by now.

One more thing, if you're going to check out PODs in next few days, see them on github instead of CPAN. There are much more of them added in 0.36 release, which is not uploaded to CPAN yet.

UPD: Just found out that 0.36 release docs can be viewed here: http://search.cpan.org/~mmcleric/Morpheus-0.36/
Morpheus::Key, Morpheus::Bootstrap and Morpheus::Plugin::Content PODs are worth to be looking at, if you are interested in implementation detals.

Roadmaps

Ever since I've promised to write about my "generic streams" framework on #moose IRC channel (it happened several weeks ago), my conscience wouldn't leave me alone.
I'm still not ready to talk about streams with enough details, so I'll talk about the bigger picture instead.

So, I'm working at Yandex company on a quite large-scale project (it's a blogsearch if you really want to know). Our codebase is separated into something like 400 separate distributions, mostly perl. Every distribution is packaged into debian package and all these packages are cross-dependent in the complicated ways. We've got several hundreds of production hosts, separate test hosts and the separate test cluster. Our group is just a small part of the whole company (less than 1 percent in fact) and so some of our modules are used outside of our blogsearch project.

Why am I telling you all this? Because the software we write and especially the software which we release in the opensource is very much shaped by this highly diverse environment. In the last few years, we encountered some common patterns in the areas of service management, software configuration and streaming data processing again and again, and came up with some interesting (IMHO) and generic solutions.

So here are three things which we can share with the world:

First, there is service management. As you can imagine, we run a lot of daemons - http services and some other stuff, and so we wrote Ubic. I wrote about it many times already, and there is a fairly high probability that you already know what Ubic is. If you don't know, just look at these slides.

Second, there is software configuration. Having a production cluster and testing cluster and separate configuration for the unit testing and maintaining it all for 400 distributions is a nightmare... unless you invent something like Morpheus. Morpheus is the "ultimate configuration engine" - it provides the unified interface to any configuration values, which then can be provided from any source (config files in any format, DB, environment, command line arguments) without any changes to code at all.
Think "Log::Any", but apply its idea to the configs.
I'm really excited about Morpheus and I'll talk about it in detail on Saint Perl 2 in Russia, in St. Petersburg on December, 18. I'll have some slides by then, so if you like the Morpheus idea too and want to know more, just wait for 2 weeks.
(If you are really impatient, you can go on and read the code, but there are almost no docs by now).

Third, streaming data processing.
Creating blogsearch means that you have to fetch a lot of web pages, then put them into storage, then index all the stuff you put into storage, and also extract links from this "new posts" stream, then process these links more (expand short links, for example), also export link to another services, also use these links to calculate antispam metrics, etc.
In my mind, it looks like a giant graph with vertices representing data stored on different hosts and groups of hosts in various forms (it can be log or it can be a DB table, or it can be a rsync share on another host), and nodes between them representing processing programs.
Also, I like to think about this picture.
Thus, streaming framework. Or more like set of libraries, some of which define common APIs, some providing specific implementations (Stream::Log, Stream::DB, etc), some helping to integrate all this stuff into the big picture. I still don't know when I'll find the will to open it into public, not because it's not ready - we are using it in production - but because I'm still trying to find the right approach to decouple it and separate useful parts from the policy as much as possible, and keep it easy to use and deploy at the same time.

Besides these three major directions we also have several small useful modules.
Some of them are already out there in separate distrubtions (Log::Unrotate, AnyEvent::HTTP::LWP::UserAgent, DBD::Safe), some are on their way.
About two of these small modules I'm uncertain if they would be useful to anyone as separate distributions, but they are still too helpful and I had to bundle them with Ubic: Ubic::Lockf, Ubic::Persistent. If you like one of them, tell me and I'll upload them immediately as the separate distributions. Especially if you'll suggest an appropriate name :)

Well... that's all by now.

About Vyacheslav Matyukhin

user-pic I wrote Ubic. I worked at Yandex for many years, and now i'm building my own startup questhub.io (formerly PlayPerl). I'm also working on Flux, streaming data processing framework. CPAN ID: MMCLERIC.