Suprisingly hard task of writing logs

At my $job, we often use files and logs as the cheap way of managing queues of data.
We then read them using Log::Unrotate module, but this is the topic for another post.
Anyway, it's important that all items written in log are never lost and never get corrupted.

So here is the task: write serialized data to file from multiple processes.
Sounds easy, right?

Well, the most obvious problem is that print() uses buffered output by default, and many lines in the log will be mixed up if you don't turn it off.
Ok, let's call $fh->autoflush(1), let's take lock on the file, let's even use syswrite() because... why not?
But even then, POSIX doesn't promise that write(2) is atomic. And if you call syswrite(), you have to check for the return value ("number of bytes actually written", quoting perldoc) and compare it to number of bytes you attempted to write.
And while you're trying to write the remaining bytes, maybe you won't do it in time and someone will kill your process...
And then another process will run and carelessly append to the broken file:
> line1
> line2
> linline3
> line4
And then you're screwed.
Especially if you're storing Storable strings in these files as we do, because thaw($str) can cause segfaults, out of memory and other bizzare errors on invalid lines.

Really, am I missing something obvious here?
Why is it so hard?
The only solution we came up with is to seek() and look at the last byte in file before every write and seek even further to the beginning of this broken line if it's not "\n".
But it looks ugly and probably slow as well.

28 Comments

Multiple process writing to a single file without any kind of locking or single-threaded writer is a recipe for disaster.

At first glance it looks dangerously easy to believe that write() or print() is atomic. As you are discovering, that is not really the case.

Consider using IPC::DirQueue, which will let you create a queue of data to be processed. it will retain ordering, and uses a directory modification protocol that ensures that things don't get clobbered.

If you want to continue to pursue the log file solution, you may want to consider writing your own logging agent. Have multiple processes connect to a socket (local Unix domain socket) that your logging agent makes available, and the agent will be responsible for reading a complete record, and writing a complete record to the log file, before consuming another one.

I agree with Jason. At one job, we did just as he is describing and 'tee'd (rough description of the logger process) logs to a single server to make it cheap to scrub log files for process status across a dozen servers. One listener buffers all the log writes (like forwarding syslog) and then we processed a single log file for status updates. None of the logger processes ever saw bottlenecks, the listener/writer happily dealt with a fairly heavy load of incoming data. It's writes were not time critical and the buffering never harmed any process. You can even loop on 127.0.0.1 to the listener so that it's all good.

What you're describing here is the *exact* problem that led to the creation of database servers. I'd suggest you just throw an instance of Postgres on a dedicated logging server and have everything log to there for log items that need to be combined.

Concurrency of multiple processes into one file is not a problem that can be solved sanely on Unix-like systems.

Why don't you use syslog for concurrent logging?

Sorry, i think i wasn't clear here:

The issue is concurrency.

On Unix it is nearly impossible or at the VERY least extremely hard, to do concurrent writes to a file without destroying the file. Databases exist to solve that problem. Lookups are a nice side effect and honestly: The use case you describe seems to point towards the read capabilities of databases also being something you want.

Since it seems like this is all your code, can't you just use the 'advisory' locking flock() utilizes?

The warning that it's merely advisory is moot here: if you control all the programs writing to this log file, you can modify them to honor that lock.

This lets you avoid a dependency on an additional daemon, but may introduce some contention between your 100s of processes.

Using DBD::SQLite also lets you avoid external dependencies, while letting you store structured data (rather than lines of a Storable image). The SQLite library will handle the locking for you, so your concurrent processes can all write at once.

Are they writing to this log file at high-frequency? Perhaps it'd be easier to let each process have it's own independent log file, and move the complexity to the log consumer to reassemble things into order?

I seem to have glazed over that part you wrote about locking both times :)

You're right, multiple processes opening a file have no POSIX primitives to atomically modify it file.

However, the vfs layer guarantees that rename() is atomic. This is how things like maildir & IPC::DirQueue do their magic. Unfortunately, this is not useful for you here.

You could also solve this problem the easiest by not caring what order they are written to the log file. You could even leave autoflush off to leave buffering on for better performance.


All you'd have to do is include a hi-res timestamp in the log file with each entry, and then when you wish to read through the log just sort it by timestamp.


In order to avoid massive sort times, you could have the log file rotated often to keep the file size small, and when the file is rotated you could also sort it at the same time.


And you said you wanted a library instead of a daemon, just write a simple small one to prepend a small accurate timestamp to each entry in the log file, and write a hook for logrotate to do the sorting when the file is rotated.

Your specific problem is really specific, because you're using log files to share data that can't get corrupted or incomplete. This is not what log files are usually used for that's why my last post was so off. I just assumed you were writing logging statements to the log file, not using the log file as a database to share data among many processes.


I don't know how to solve your IPC/sharing data problem. But you should reconsider using log files, and consider using a database either the regular server kind like MySQL, or the "just a library" kind like sqlite. And if you're going to use a database maybe a object-relation mapper like DBIx::Class is a better fit than using Storable.


In a previous comment you say, "In the bigger picture, the issue is that there are no transactions in POSIX filesystem API." When I think of transactions I either think of bank ATMs, or databases. So, if you need transactions consider a real database or sqlite.


Also, in place of grep, wc -l, and other commands, if you use sqlite, you can use the sqlite3 command to run whatever queries against your database that you want to without bothering with DBD::SQLite. A database would fix the transaction problem, but then you would have maintain and test the database. I don't know, but I do know that when I think of log files I think of logging helpful debugging information not sharing data, and when I think of transactions I think of databases.

A simple way to solve your problem without locks:

open my $afh, '>>', $log_filename;
open my $wfh, '>' , $log_filename;

then, in order to write a line $line:

my $zeros = "\x00" x length $line;
1 while (syswrite $afh, $zeros != length $line);
my $start = seek($afh, 0, 1) - length $line;
seek($wfh, $start, 0);
while (length $line) {
  my $bytes = syswrite($wfh, $line);
  substr($line, 0, $bytes, "");
}

And on the reader just ignore zeros between the lines.

[and don't forget to add error checking code or to use autodie!]

oops, in my previous comment code, a s/seek/sysseek/g is required!

Are log lines fixed length or variable length? You mentioned in one post your writing 'Storable' images?

There is also a nice thing called a network monitoring system, if you stream the combined log to the NMS and let it work out who to email, what ticket to create, and how to escalate. It has it's own db, so no need to roll your own.

then replace the zeros by "\n" when the reserved space is not big enough:

my $zeros = "\x00" x length $line;
while (1) {
  my $bytes = syswrite $afh, $zeros;
  last if $bytes == length $line;
  if($bytes > 0) {
    my $start = sysseek($afh, 0, 1) - $bytes;
    sysseek($wfh, $start, 0);
    my $empty = "\n" x length $bytes;
    syswrite($wfh, $empty);
  }
}
my $start = sysseek($afh, 0, 1) - length $line;
seek($wfh, $start, 0);
while (length $line) {
  my $bytes = syswrite($wfh, $line);
  substr($line, 0, $bytes, "");
}

When the reader finds that some byte in the file is zero it should (busy) wait until it becomes non zero.

Don't wait forever, just a little bit!

Leave a comment

About Vyacheslav Matyukhin

user-pic I wrote Ubic. I worked at Yandex for many years, and now i'm building my own startup questhub.io (formerly PlayPerl). I'm also working on Flux, streaming data processing framework. CPAN ID: MMCLERIC.