I couldn't help it - Parsing Empathy log files in 20 seconds or less.

My current instant messaging application is Empathy. It's nice, though I wish it had a Perl interface, plugins and a few more features I want/need. It never matters enough to actually change applications.

Today I needed to go over a history file with a colleague that was pretty long. Popped up the "previous conversations" in Empathy to find that the record starts from the last hour or so (out of about 5 hours long of history). How nice.

I searched for the actual log files and found them in ~/.local/share/Empathy/logs/gabble_jabber_user_40domain_2eextension0/colleague@domain.extension. Comfortably they are in XML form. Excellent!

I shouldn't be parsing XML (or any other SGML) with regular expression. I know that! But.. I really really wanted to have it in 2 seconds instead of 2 hours, I could help it!

I reckon if it's specific enough and won't be used for more than this specific minute, the standards police (which I love and cherish) will let me off the hook this time.

I wrote the following in a file:

use strictures 1;
use File::Slurp;
use HTML::Entities;
my $file = '20100926.log';
chomp ( my @lines = read_file($file) );
foreach my $line (@lines) {
my $name = $1 if $line =~ / name=' ( [\w\@\.]+ ) ' /x;
my $msg = $1 if $line =~ / type='normal'> (.+) <\/message> /x;
$name or next;
$msg or next;
$name = substr $name, 0, index $name, '@';
decode_entities($msg);
print "<$name> $msg\n";
}

Viola!

P.S.:
Don't do this at home!

1 Comment

Leave a comment

About Sawyer X

user-pic Gots to do the bloggingz