August 2012 Archives

Q: When not to use Regexp? A: HTML parsing

It always starts out as something simple and innocent and then the Internet ruins it.

So I am giving a data mining talk at Ohio LinuxFest 2012 and surprise, surprise there is going to be a nice helping of Perl. So I am on the internet doing research looking for some simple scrapers and collectors to mention in my talk. I always prefer to give multiple examples for any problem since programming does not have a one size fits all model. To make a long story short I found a bunch of different social media scapers. The problem I found with most of them w…

A NYTprof encoding hiccup

While using Devel::NYTProf on a new application I started getting this message
fid 33 has no src saved for /usr/lib/perl5/5.14.2/autodie.pm (NYTP_FIDf_HAS_SRC not set but src available!)

Now my first thought was this has something to do with either the newest version of autodie or utf8::all. So I checked to make sure all the modules I was using were up to date and tested again, still there. Then I wrote a really short program to recreate this error and for some reason I couldn't. Going back and forth between the two files I finally n…

DIY personal analytics

How many times a day do you reach for <ctrl> + r when using the shell? What about the history command? !! anyone?

Do we as programmers evolve and stop making the same mistakes? Do we really optimize our workflows? This is where the idea of personal analytics comes in. I am going to see what I can learn from looking at my bash history for the last few years. Here are the relevant settings in my .bashrc file:

shopt is a bash command that shows and changes shell option names. The histappend…

An overview of spell checking modules

Spell checking is one of those problems that is already solved... sorta.

Like all problems it really depends on context. Take Jon Bentley's Programming pearls: a spelling checker where he examines the problem space and the differences between a spell checker and a spelling corrector. I start by searching the keyword 'spell' across all of CPAN.

wget http://www.cpan.org/modules/01modules.index.html
ack -i spell 01modules.index.html

Backing up Berlios.de

Last year it was announced that www.berlios.de was going to be shut down. People were asking if someone was going to back it up to save all those open source projects. I decided to gave it a shot and I was able to backup all of the berlios projects. While working on the process of uploading it to a new host (I was looking at github) it was announced that the site was saved, so I set the project aside.
Digging around I found this code and decided to post it so that people who are trying to build data mining style tools can have another real w…

Finding Perl material online

So you need Perl information and the perldoc does not have what you need. First stop the search engine. You type in the keywords and start exploring. One thing I kept noticing with different searches were the results returned that were just the POD online. I decided I was tired of looking at it so I created a Google Custom search that filters out the sites I kept seeing that provided no value.
cpansearch.perl.org
perldoc.perl.org
cpan.org
metacpan.org
ebay.com
amazon.com

The last two kept returning info…

utf8::all and autodie now coexist peacefully

autodie version 2.12 works with use open now.

Recently I was reading a program that was using utf8::all and I decided to take another look at the module. The last time I tried it out was version 0.003 from 2011 and it basically did the following:

use utf8;
use open ( :std :encoding(UTF-8) );
use charnames ( :full :short );

@ARGV = map { decode_utf8($_, 1) } @ARGV;

Now autodie did not play nice with use open so that was a blocker for using utf8::all in apps. With the latest version I get use warnings qw( FATAL utf8 ). Looking at the updated POD I see that autodie 2.12 now works correctly with use open. YAY! I have applications using autodie with boilerplate utf8 support and now it is shorter. From this

use utf8;
use 5.014;
use warnings;
use warnings  qw(FATAL utf8);
use charnames qw(:full :short);
use autodie qw(:all);

#plus other stuff in the program

to this

use 5.014;
use warnings;
use autodie qw(:all);
use utf8::all;

I have updated a few of my backup scripts to use the above boilerplate and so far no problems. I was able to remove a few lines here and there and no more “wide character in print” messages which I just ignored before.

I came upon the autodie update by accident so I am keeping up with new releases by keeping an eye on @cpan_new and http://www.metacpan.org/recent.

About Kimmel

user-pic I like writing Perl code and since most of it is open source I might as well talk about it too. @KirkKimmel on twitter