I couldn't help it - Parsing Empathy log files in 20 seconds or less.
My current instant messaging application is Empathy. It's nice, though I wish it had a Perl interface, plugins and a few more features I want/need. It never matters enough to actually change applications.
Today I needed to go over a history file with a colleague that was pretty long. Popped up the "previous conversations" in Empathy to find that the record starts from the last hour or so (out of about 5 hours long of history). How nice.
I searched for the actual log files and found them in ~/.local/share/Empathy/logs/gabble_jabber_user_40domain_2eextension0/colleague@domain.extension. Comfortably they are in XML form. Excellent!
I shouldn't be parsing XML (or any other SGML) with regular expression. I know that! But.. I really really wanted to have it in 2 seconds instead of 2 hours, I could help it!
I reckon if it's specific enough and won't be used for more than this specific minute, the standards police (which I love and cherish) will let me off the hook this time.
I wrote the following in a file:
use strictures 1;
use File::Slurp;
use HTML::Entities;
my $file = '20100926.log';
chomp ( my @lines = read_file($file) );
foreach my $line (@lines) {
my $name = $1 if $line =~ / name=' ( [\w\@\.]+ ) ' /x;
my $msg = $1 if $line =~ / type='normal'> (.+) <\/message> /x;
$name or next;
$msg or next;
$name = substr $name, 0, index $name, '@';
decode_entities($msg);
print "<$name> $msg\n";
}
Viola!
P.S.:
Don't do this at home!
I would have used Mojo::DOM for that:
http://pb.rbfh.de/HQz8ii8siDgl