A `tail -f` in BBEdit.

I knew BBEdit would update open files when they changed on disk, but I never thought to try that with a continually updating file. It works. You just have to have the file open. No big whoop.


Yet another stupid mistake #1

During a refactor of a data from array @foo to hash %foo, I used 'each' to iterate over the hash, but forgot to change the 'for' statement with 'while'. So I ended up with something like:

$ perl -MData::Dump -E'%a=(a=>1, b=>2);
for (my ($k, $v) = each %a) { $_ = "$k x"; dd {k=>$k, v=>$v, "\$_"=>$_} }'

And this is nasty because for(@ary) aliases $_ to each element in @ary, and in this case it modifies $k (quiz #1: and $v too, do you know why?) right under your nose! Thus the result are really messed up:

{ "\$_" => "a x", "k" => "a x", "v" => 1 }
{ "\$_" => "a x x", "k" => "a x", "v" => "a x x" }

Not to mention the loop stops after processing two items (quiz #2: do you know why?) But you might not realize that after you add some pairs to %a and wondering why they don't get processed.

The error message Perl gives is not really helpful, to say the least :)

Hackathon


We'll have another hack session at Quetzal Internet Cafe in the
middle of October. The location is close to Civic Center BART,
and the 38, 47, and 49 bus lines. Come bring you laptop and hack on your
favorite CPAN module, and chat with fellow SF.pm Perl Mongers.

Announcement posted via App::PM::Announce

RSVP at Meetup - http://www.meetup.com/San-Francisco-Perl-Mongers/calendar/14948010/

One-liner XML / Perl / JSON

castaway blew my mind this morning in irc.perl.org #axkit-dahut.

Convert an XML file to Perl data structure:

perl -MXML::Simple -MData::Dumper -le'print Dumper XMLin("foo.xml")'

Convert an XML file to JSON:

perl -MJSON::Any -MXML::Simple \
   -le'print JSON::Any->new()->objToJson(XMLin("foo.xml"))'

"How do I do X?" Like this! Poof!

Some days Perl feels like a Las Vegas magic show. :)

Reading META.yml when it's not UTF-8

Part of the 3% of the distributions I couldn't index with MyCPAN had encoding issues. YAML is supposed to be UTF-8, but when I don't always get UTF-8 when I generate a META.yml for files that don't have one. I guess I could do the work to poke around in Makemaker, etc, to convert all the values before I generate the META.yml, but um, no. Not only that, not all of the META.yml files already in the dists are UTF-8. Remember, however, this is a very small part of BackPAN: about 700 distributions out of 140,000 (or about 1/7th of my problem cases).

A couple hundred distros have Makefile.PL files encoded as Latin-1 in a way that it matters. If it's not collapsable to ASCII, the META.yml ends up with Latin-1 in it. Some YAML parsers refuse to deal with that.

blog moving

Moving my blog here from blog on use.perl.org

I couldn't help it - Parsing Empathy log files in 20 seconds or less.

My current instant messaging application is Empathy. It's nice, though I wish it had a Perl interface, plugins and a few more features I want/need. It never matters enough to actually change applications.

Today I needed to go over a history file with a colleague that was pretty long. Popped up the "previous conversations" in Empathy to find that the record starts from the last hour or so (out of about 5 hours long of history). How nice.

I searched for the actual log files and found them in ~/.local/share/Empathy/logs/gabble_jabber_user_40domain_2eextension0/colleague@domain.extension. Comfortably they are in XML form. Excellent!

I shouldn't be parsing XML (or any other SGML) with regular expression. I know that! But.. I really really wanted to have it in 2 seconds instead of 2 hours, I could help it!

I reckon if it's specific enough and won't be used for more than this specific minute, the standards police (which I love and cherish) will let me off the hook this time.

Using blogs.perl.org

I stumbled into blogs.perl.org last night. Here's a couple "quick start" tips for using this install of Movable Type Pro:

(1) Code blocks. If you choose Format: Markdown, leave a blank line, indent text with 4 spaces, then another blank line

you will get code blocks like this
# with some
$rudimentary = "syntax highlighting";

(2) Blog subtitle. Erez Schatz was kind enough to point out how to set your blog subtitle (e.g.: Mutation Grid, Inc. "Controlled software evolution." above): From the blogs.perl.org page, click on Post, then, on the top menu bar: Preferences - General, the subtitle is "description".

Writing Plack Debugging Middleware for Catalyst

I now have our work project running (sort of) on Catalyst 5.80007. This is because it's the oldest version of Catalyst I can use with Plack. I wanted that just because the debugging middleware for Plack is just so friggin' awesome and I wanted to write my own. Now I have and here's how easy it is (with screenshots).

September Meeting of Erlangen.pm

This months meeting took place in the refurbished Trattoria Dolomiti.

We were eight perl mongers and had a special guest, Bernd Hendl. Bernd guest had nothing to do with actual Perl programming but he was searching for a new employer for some Perl web applications of his company.

Topics that came up this past meeting were:

  • Version control systems; ranting about commercial VCS'
  • Company policies regarding development tools
  • Local Perl job market (spawned by Bernd's job offer)
  • mod_python, and a weird bug that one of the mongers observed therein on a production machine
  • higher order functions (like map and grep), their (non-)existance in various programming languages, and if higher abstractions make code harder to read or not

Note that often we don't settle on any topics in advance, but just let the discussion flow.

If you are in the Erlangen/Nuernberg area, don't hesitate to visit our monthly meetings, or contact us for extra meetings if your visit don't coincide with the third Monday of the month.

So who knew...

I'm currently working with extracting data from a system with an XML based command UI, so I am fairly often dumping serialised (using Data::Dump) perl objects out whilst debugging.
To make the piles of debug output easier for me to parse I pushed the files through Perl::Tidy.
You would not believe how long it takes, or how much memory is required, to run 110MB of perl datastructure dumps through perltidy!
Actually I don't know how long or how much memory it took either - I killed it after half an hour and 3GB.
I mean, who knew! :-)

Overlapping regex matches

irc.perl.org #perl-help posed a good question tonight. Why does this only find some of the matches?

my $sequence = "ggg atg aaa tgt tcc cgg taa atg aat gcc cgg gaa ata tag cct gac ctg a"; 
$sequence =~ tr/ //d; 
print "Input sequence is: $sequence \n";  
while ($sequence =~ /(atg(...)*?(taa|tag|tga))/g) {print "$1 \n";}

Because, by default, regex /g begins each subsequent search after the end of the last match, so overlapping hits are not found. As this blog post explains, a negative lookahead assertion is the key to finding all of them. This works great:

while ($sequence =~ /(?=(atg.*?(taa|tag|tga)))/g) {
   print "$1\n";
}

I'm partial to bioinformatics homework after 4 years of hacking on the stuff. :)

MyCPAN indexes 97% of BackPAN

My goal a long time ago was to index about 90 to 95% of BackPAN, thinking that if I didn't get some ancient distributions that would be just fine and no one would miss them. There are about 140,000 distributions to index, and I'm figuring out why I can't get the last 4,200. That means I'm indexing

97%


Normalize till it's normal!

Recently I got a nice small project work on: a web interface for a database with a simple search mechanism (Ajax for frontend with redirects to actual result pages).

I received the database in Excel form. No worries, we have the excellent Spreadsheet::ParseExcel so I'm not scared of spreadsheets. Bring it on!

And yes, the client did "bring it on". He brought it on with 260 columns, nonetheless. Each contained a "1" or "0" for match. "You just go over the columns here, look for '1', and then continue over to the product name, search it in this sheet over here and find the number to the right and return that to client - simple!"

Yes, two-hundred and sixty columns. Alright, so I'll just normalize it. "You don't need to normalizical nothing [double negative!], it's good the way it is" - "No, trust me, I need to normalize it" - "Alright, knock yourself out".

Installing Catalyst by Hand

I'm investigating a particular issue at work and I thought "Plack debugging middleware is exactly what I want right now". Specifically, I want this:

Hackathon


Come one come all, SF.pm Hackathon at Paul's.

Next Tuesday in Bernal Heights from 7:00pm until whenever.

For those that haven't been chez Paul we have a basement, bar, projector, wifi, yard, BBQ, etc so we can eat, drink & give presentations. There's space for at least a dozen seated inside, and more outside (for those that can withstand the Day Star).

We'll be hacking on whatever, or just shooting the breeze about Perl.

Summary:
What: SF.pm Hackathon
When: Tuesday 28th September 2010, 19:00 'til Paul kicks us out.
Where: Paul's place, SF, 94110 (address on email to Yes RSVP on day of, in Bernal Heights.)
What to bring: computer, snacks & drinks.

Announcement posted via App::PM::Announce

RSVP at Meetup - http://www.meetup.com/San-Francisco-Perl-Mongers/calendar/14879538/

Comparison of Perl serialization modules

A while ago I needed a Perl data serializer with some requirements (supports circular references and Regexp objects out of the box, consistent/canonical output due output will be hashed). Here's my rundown of currently available data serialization Perl modules. A few notes: the labels fast/slow is relative to each other and are not the result of extensive benchmarking.

Data::Dumper. The grand-daddy of Perl serialization module. Produces Perl code with adjustable indentation level (default is lots of indentation, so output is verbose). Slow. Available in core since the early days of Perl 5 (5.005 to be exact). To unserialize, we need to do eval(), which might not be good for security. Usually the first choice for many Perl programmers when it comes to serialization and arguably the most popular module for that purpose.

Compiling Libraries, part II

In a previous post I wrote about the lack of a Perl module to build standalone C libraries. I suggested the creation of a new module, and I did it. I have my first working code available at github. I am happy to add patches as far as the main objective of the module remains intact.

At the moment I tested it with Mac OS X (Leopard) and Windows (with Strawberry Perl). In both cases, with Perl 5.12.x. So, the Build.PL might be missing a Perl version if there is anything that doesn't work on previous Perl versions.

Also, documentation is still missing. Refer to test 01-simple.t for directions on how to use it.

The physicist's way out

Previously, I wrote about modeling the result of repeated benchmarks. It turns out that this isn't easy. Different effects are important when you benchmark run times of different magnitudes. The previous example ran for about 0.05 seconds. That's an eternity for computers. Can a simple model cover the result of such a benchmark as well as that of a run time on the order of microseconds? Is it possible to come up with a simple model for any case at all? The typical physicist's way of testing a model for data is to write a simulation. It's quite likely a model has some truth if we can generate fake data sets from the model that look like the original, real data. For reference, here is the real data that I want to reproduce (more or less):

slow benchmark

If you don't do your homework, you don't get to Perl

One of the most common responses to simple, text-book-quality questions on many Perl community outlets is "We are not here to do your homework". It's usually thrown in a swift, abase, manner, as if saying "How DARE you ask us to answer your assignment for you?!", and at times is accompanied by a general comment as to the asker's intelligence, seriousness, effort, capabilities, values, ethics and sexual capabilities. It is also, always, the most incorrect response possible.

About blogs.perl.org

blogs.perl.org is a common blogging platform for the Perl community. Written in Perl with a graphic design donated by Six Apart, Ltd.