May 2011 Archives

Proposal for corporate people who are stuck with old system perl

So I read this post complaining about Mojo deprecating 5.8 support.
As far as I understand, there are 3 groups of people:
1) People using shared hostings, sometimes without the shell access at all; their problem with the old perl can't be solved, they just should migrate to another hoster.
2) People who use perlbrew and install all dependencies either into ~/perl5/ or maybe even them right into project/ with local::lib
3) People who use a native packaging system.

I myself belong to the third group.
First, because it's how things are done at my $job.
Second, because I believe this is the right way to do things, although I understand that many people think otherwise.

Now before I'll explain the main point of this post, here are the reasons why .deb/.rpm packages may be better than CPAN:
- your system administrators will be happy;
- you'll be able to provide dependencies on the external software (lighttpd/apache, mysqlclient, etc.);
- you'll be able to put in your package things outside of /usr/bin/ /usr/share/perl5 (I'm talking about debian/ubuntu here, but RPM world is probably similar);
- postinst/prerm scripts! they are awesome;

Ok, may be I'm not familiar with modern CPAN modules for these tasks, I never got time to play with MyCPAN::App::DPAN and other brian d foy's modules.
And of course you have to install your own debian repository and write some custom tools to make all this stuff work painlessly.
But still, I feel that debian packaging system is more advanced and mature that CPAN.
(Please don't bash me for this opinion, it's not the point of this post. Even if I'm wrong, first argument about system administrators still stands.)

Now to the main idea of this post.
Even if we have to deal (like to deal) with native debian/rpm packages, it doesn't mean we have to use system perl.
We could build our own perl-5.14 deb package with perl binary called /usr/bin/perl5.14 with @INC not intersecting with standard system @INC, then rebuild whole CPAN as libfoo-bar-perl5.14.
We could put all these packages into one common repository (well, one for each distribution out there) and open it to everyone, and setup the system which would rebuild all new cpan releases and put them there automatically.
Ok, maybe not all modules, but most of them are trivial to build with dh-make-perl utility.

Looking at debian.pkgs.cpan.org, I think I'm not the first one to come up with this idea, but that host looks pretty much dead. And the most important thing here is to make this repository updated automatically.

Actually, last summer 4 of guys from Moscow.pm group (including me) tried to build such thing ourselves. We coded for one day, got some progress going, and then never found enough momentum to finish it.
Anyone wants to try once more? :)

Suprisingly hard task of writing logs

At my $job, we often use files and logs as the cheap way of managing queues of data.
We then read them using Log::Unrotate module, but this is the topic for another post.
Anyway, it's important that all items written in log are never lost and never get corrupted.

So here is the task: write serialized data to file from multiple processes.
Sounds easy, right?

Well, the most obvious problem is that print() uses buffered output by default, and many lines in the log will be mixed up if you don't turn it off.
Ok, let's call $fh->autoflush(1), let's take lock on the file, let's even use syswrite() because... why not?
But even then, POSIX doesn't promise that write(2) is atomic. And if you call syswrite(), you have to check for the return value ("number of bytes actually written", quoting perldoc) and compare it to number of bytes you attempted to write.
And while you're trying to write the remaining bytes, maybe you won't do it in time and someone will kill your process...
And then another process will run and carelessly append to the broken file:
> line1
> line2
> linline3
> line4
And then you're screwed.
Especially if you're storing Storable strings in these files as we do, because thaw($str) can cause segfaults, out of memory and other bizzare errors on invalid lines.

Really, am I missing something obvious here?
Why is it so hard?
The only solution we came up with is to seek() and look at the last byte in file before every write and seek even further to the beginning of this broken line if it's not "\n".
But it looks ugly and probably slow as well.

About Vyacheslav Matyukhin

user-pic I wrote Ubic. I worked at Yandex for many years, and now i'm building my own startup questhub.io (formerly PlayPerl). I'm also working on Flux, streaming data processing framework. CPAN ID: MMCLERIC.