Suprisingly hard task of writing logs
At my $job, we often use files and logs as the cheap way of managing queues of data.
We then read them using Log::Unrotate module, but this is the topic for another post.
Anyway, it's important that all items written in log are never lost and never get corrupted.
So here is the task: write serialized data to file from multiple processes.
Sounds easy, right?
Well, the most obvious problem is that print() uses buffered output by default, and many lines in the log will be mixed up if you don't turn it off.
Ok, let's call $fh->autoflush(1), let's take lock on the file, let's even use syswrite() because... why not?
But even then, POSIX doesn't promise that write(2) is atomic. And if you call syswrite(), you have to check for the return value ("number of bytes actually written", quoting perldoc) and compare it to number of bytes you attempted to write.
And while you're trying to write the remaining bytes, maybe you won't do it in time and someone will kill your process...
And then another process will run and carelessly append to the broken file:
And then you're screwed.
Especially if you're storing Storable strings in these files as we do, because thaw($str) can cause segfaults, out of memory and other bizzare errors on invalid lines.
Really, am I missing something obvious here?
Why is it so hard?
The only solution we came up with is to seek() and look at the last byte in file before every write and seek even further to the beginning of this broken line if it's not "\n".
But it looks ugly and probably slow as well.