One more Perler

My brother finally created his first GitHub account to try and work on public code and even forked and asked for a pull request on a module I'm working on.

He's now converting yet another CGI website to Dancer.

Here's hoping this will lead to a fun and joyful career.

Using PowerShell is Like Passing Hashes in Perl

At first I was excited that Microsoft had created PowerShell -- a usable command-line shell for Windows. (I always have 4 Cygwin Bash windows up on my XP PC at work, and before Cygwin got stable I ran the MKS Toolkit version of the Korn Shell.)

Once I started using Powershell, I quickly became disappointed. There wasn't anything in PowerShell that I wanted to do that did not exist in an easily-consumable form in Perl. That would have been acceptable -- if it hadn't been for how slow PowerShell was compared to Perl or Cygwin Bash. As someone whose bread'n'butter for several years has been .NET programming, I am still not sure why PowerShell is so much slower than Perl or Bash (if anyone knows, please tell me). (I don't have problems getting a sane level of performance out of .NET.)

Smart match versus hash deathmatch

A couple of days ago, I posted about my answer to a Stackoverflow question asking about the speed of the smart match operator. Smart matches looking for a scalar in an array, can short circuit, so they are pretty speedy.

The trick, however, is if this sort of smart match is faster than a hash lookup. Yes they are and no they aren't. I've updated my Stackoverflow answer with additional benchmarks and a new plot.

Smart matches are faster if you have to create the hash, but slower if you already have the hash.

There's a middle ground in between that I don't care to find. For some number of searches of the hash, the cost of creating the hash amortizes enough that it's faster than a smart match.

It depends on what you are doing, but that's the rub with every benchmark. The numbers aren't the answers to your real question, in this case "Which technique should I use?". They only support a decision once you add context.

Perl 101: What Do You Want?

I may be out of touch for a bit as I'm moving to Amsterdam tomorrow night, but in the meantime, tell me what you would like to see for "Perl 101" blog posts. Every time I post something with the newbies tag (which I'm going to change to the friendlier "perl101"), I get a fair number of comments and the post shows up a few times on Twitter. Since I'm getting a good response, I'm assuming that people actually like these posts and want to see more of them.

So tell me what you want and I'll see what I can do.

Converting Complex SVN Repositories to Git - Part 3

Resolving Branches and Calculating Merges

The most important part of the repository conversion I did was resolving all of the branches and calculating the merge points. The majority of the rest of the process is easily automated with other tools.

The main part of this section was determining what had happened to all of the branches. One of the important differences between Git and SVN is that if a branch is deleted in Git, any commits that only existed in that branch are permanently lost. With SVN, the deleted branches still exist in the repository history. git-svn can't delete branches when importing them, because that would be losing information. So all of the branches that existed throughout the history of the repository will exist in a git-svn import and must be dealt with.

Moving my blog from use.perl.org

I have moved my blog to here from use.perl.org (Mark Leighton Fisher).

ElasticSearch.pm gets big performance boost

ElasticSearch version 0.12 is out today along with some nice new features.

However, the thing I'm most excited about is that ElasticSearch.pm v 0.26 is also out and has support for bulk indexing and pluggable backends, both of which add a significant performance boost.

Pluggable backends

I've factored out the parts which actually talk to the ElasticSearch server into the ElasticSearch::Transport module, which acts as a base class for ElasticSearch::Transport::HTTP (which uses LWP), ::HTTPLite (which uses, not surprisingly, HTTP::Lite) and ::Thrift, which uses the Thrift protocol

I expected Thrift to be the big winner, but it turns out that the generated code is dog-slow. However, HTTP::Lite is about 20% faster than LWP:

   httplite   :  63 seconds, 951 tps
   http       :  79 seconds, 759 tps
   thrift     :  690 seconds, 87 tps

Bulk indexing

Since version 0.11, ElasticSearch has had a bulk operation, which can take a stream of index, create and delete statements in a single request.

Storable: "freeze" versus "nfreeze"

I was doing a code review and discovered that one of our developers wrote code using Storable's freeze() function. This turned out to be a bug because we store objects in memcache with nfreeze() instead. Storable's docs have only this to say about nfreeze().

If you wish to send out the frozen scalar to another machine, use "nfreeze" instead to get a portable image.

Since people generally use freeze() instead, I decided to dig around and figure out what was going on. After all, if nfreeze() is portable, there must be a price to pay, right?

Fun with recursive anonymous subroutines

I'm doing lots of work with representing stuff stored in the file system as trees at the moment as part of my toolkit for open source qualitative research software.

One of the things I need to do (for making reports) is to transform this:

 [ [qw/foo bar/],
   [qw/a b /],
   [qw/x y/], ];

into this tree structure:

 {
   'foo' => {
       'some_data' => 'lvl0',
       'children' => {'a' => {
           'some_data' => 'lvl1',
           'children' => { 'y' => 'leaf', 'x' => 'leaf' } },
                      'b' => {
                          'some_data' => 'lvl1',
                          'children' => {
                              'y' => 'leaf', 'x' => 'leaf' }}}}};

Being a nice golf problem I thought I'd ask on irc if there was a hacker better than me who felt like taking a look at this. ribasushi++ obviously had a little procrastination time available and wrote me a nice solution which I needed to make into a closure via a recursive subref:

The Pearl Metaphor

After "What Weird Al and Larry have in common" and "some thoughts about Pearls" comes here the showdown of our little trilogy about the meaning of the name of our favorite language.

Some people asked me, why I don't use more words to explain some terms like binah and give more links. I try to do it this time a bit more. Some Jews may even say thats not good to talk about such things at all in the open, but i prefer to orient myself on the baal shem tov who said otherwise. to some this may be completely of the top, but on other hand you not might get so easily to that kind of information. :)

How fast is Perl's smart match?

Karel Bílek on StackOverflow wondered if the smart match operator was smartly searching. We know it's smart about what it should do, but is it also smart in how it does it? In this case, is it smart about finding scalars in an array?

I benchmarked three important cases: the match is at the beginning of the array, the end of the array, and in the middle of the array. Before you look at my answer on StackOverflow though, write down what you think the answer should be. Done? Okay, now you can peek

Perl 101: avoid "elsif"

We had some code which looked (sort of) like this:

local::lib and perlbrew

Because I seem to be doing this a lot at the moment, here’s my quick-start to local::lib and perlbrew … the saner way to run perl!

DBIx-Class and database schemas in PostgreSQL

Database schemas are a little like packages in Perl, they provide namespace. If you have a database with dozens, or even hundreds of tables, you really like to divide them into logical groups.

In PostgreSQL you do like this

CREATE SCHEMA <db_schema>;
SET search_path TO <db_schema>;

If you don't create a schema, all your stuff goes into the default schema public.

DBIx::Class knows about db schemas, but not enough to make them work out of the box. Or at least it seems that way. Here's how I did it.

FIrst (well, after creating the database with the db schemas itself. But that's an exercise left to the reader), I created the DBIC classes for the tables with the excellent tool dbicdump. (It's installed together with DBIx::Class::Schema::Loader). dbicdump creates the class structure right below your current directory. So I started with cd lib/ and then:

Nice joke in thread about booking.com looking for Perl hackers

It seems that booking.com is looking for Perl programmers, which is discussed here: http://news.ycombinator.com/item?id=1784399

This thread contains a very nice joke (IMO):
16 points by mmaunder 2 days ago http://jobs.perl.org/ While Erlang and Haskell may get you laid, Perl remains the glue of the web.
21 points by mustpax 2 days ago

Not that I don't know but for the other readers here, how would one get laid with Erlang or Haskell?

12 points by blackdog 2 days ago quickly and in parallel.

Server Deployment Packaging (2)

Last week I posted about my current experiments in deploying perl applications to our Centos 5 servers - or rather the first steps of building a perl package along with the required modules.

I am just starting work on testing this all through, when suddenly one of the blocks to using the current stable perl (ie 5.12.2) has disappeared - TryCatch is now supported on 5.12.x

So, although I have some current tests running, I am just in the process of modifying a few parts of the build scripting (mainly down to me missing a couple of local modules from the build), and then a new version based on current stable perl will hit the build systems.

PostgreSQL Conference West 2010

I will be attending the PostgreSQL Conference West 2010 at the Sir Francis Drake hotel in San Francisco from November 2nd to 4th. I'm waiting to hear back from the travel agency to see when I'm flying out of and arriving back at D/FW.

Converting Complex SVN Repositories to Git - Part 2

Initial Import into Git

Creating a mirror

SVN is slow, and git-svn is slower. The amount of network traffic needed by SVN makes everything slow, especially since git-svn needs to walk the history multiple times. Even if I made no mistakes and only had to run the import once, having a local copy of the repository makes the process much faster. svnsync will do this for us:

# create repository
svnadmin create svn-mirror
# svn won't let us change revision properties without a hook in place
echo '#!/bin/sh' > svn-mirror/hooks/pre-revprop-change && chmod +x svn-mirror/hooks/pre-revprop-change
# do the actual sync
svnsync init file://$PWD/svn-mirror http://dev.catalyst.perl.org/repos/bast/
svnsync sync file://$PWD/svn-mirror

Importing with git-svn

Next, we have to import it with git-svn:

Moving to Amsterdam to work for Booking.com

I thought I'd note here too as well as on my blog that I'll be moving to Amsterdam tomorrow to work for Booking.com. I'm looking forward to the new challenges and getting settled in a new city, as well as meeting and working with some new people.

Excel::Writer::XLSX


I have released a new module to CPAN for writing Excel files in the 2007 XLSX format: Excel::Writer::XLSX

It uses the Spreadsheet::WriteExcel interface but is in a different namespace for reasons of maintainability.

Not all of the features of Spreadsheet::WriteExcel are supported but they will be in time.

The main advantage of the XLSX format over the XLS format for the end user is that it allows 1,048,576 rows x 16,384 columns, if you can see that as an advantage.

From a development point of view the main advantage is that the XLSX format is XML based and as such is much easier to debug and test than the XLS binary format.

It has become increasingly difficult to carve out the time required to add new features to Spreadsheet::WriteExcel. Even something as seemingly innocuous as adding trendlines to charts could take up to a month of reverse engineering, debugging, testing and implementation.

Hopefully the XLSX format will allow for faster, easier test driven development and may entice in some other contributors.

About blogs.perl.org

blogs.perl.org is a common blogging platform for the Perl community. Written in Perl with a graphic design donated by Six Apart, Ltd.