I couldn't help it - Parsing Empathy log files in 20 seconds or less.

My current instant messaging application is Empathy. It's nice, though I wish it had a Perl interface, plugins and a few more features I want/need. It never matters enough to actually change applications.

Today I needed to go over a history file with a colleague that was pretty long. Popped up the "previous conversations" in Empathy to find that the record starts from the last hour or so (out of about 5 hours long of history). How nice.

I searched for the actual log files and found them in ~/.local/share/Empathy/logs/gabble_jabber_user_40domain_2eextension0/colleague@domain.extension. Comfortably they are in XML form. Excellent!

I shouldn't be parsing XML (or any other SGML) with regular expression. I know that! But.. I really really wanted to have it in 2 seconds instead of 2 hours, I could help it!

I reckon if it's specific enough and won't be used for more than this specific minute, the standards police (which I love and cherish) will let me off the hook this time.

I wrote the following in a file:


use strictures 1;

use File::Slurp;

use HTML::Entities;

my $file  = '20100926.log';

chomp
( my @lines = read_file($file) );

foreach my $line (@lines) {

   
my $name = $1 if $line =~ / name=' ( [\w\@\.]+ ) ' /x;

   
my $msg  = $1 if $line =~ / type='normal'> (.+) <\/message> /x;

    $name
or next;

    $msg  
or next;

    $name
= substr $name, 0, index $name, '@';

    decode_entities
($msg);

   
print "<$name> $msg\n";

}

Viola!

P.S.:
Don't do this at home!

1 Comment

Leave a comment

Sign in to comment, or comment anonymously.

About Sawyer X

user-pic Gots to do the bloggingz