February 2012 Archives

How to test an abstract builder?

Hi Perl blogosphere. Yes, today I’m talking into the camera.

I’m very pleased to have had my Perl Foundation Grant to develop Alien::Base funded and I have redoubled my efforts.

I have high hopes for this module as a tool to help authors provide the C libraries they need through CPAN far more easily than hand rolling an Alien:: module. As such I feel a deep urge to ensure that Alien::Base is robustly tested.

Unfortunately I’m finding that testing such a module is rather hard. Alien::Base is a very abstract concept. It only makes any sense when it is used as the base class for some other Alien:: module, and futher, THAT module is only fully realized when it is used by some other module which needs that C library.

So far I have been using a version of Alien::GSL which has been retooled to use Alien::Base, further I am working on providing a simple C library which can be distributed with A::B as another test case. Still this seems rather hackish. Of course I test what I can separately, but much of the functionality comes when used “all together”.

Anyway, I’m not asking for any magic beans, but perhaps someone knows of a secret sauce or example. Thanks!

Update:

With a couple changes to my test-case library’s Build.PL, I can now ‘do-load’ that file and get the M::B object back. From there I can dispatch the build commands, but still have access to the builder object. This is much better than system('perl Build.PL') from inside a .t file, and worlds better than running Build.PL by hand.

Ok I’m feeling a lot better now. Still if you have any suggestions, I’m all ears!

A milestone for Alien::Base

I have been working on a set of base classes intended to make creating a new Alien:: distribution for some library as easy as making a simple Module::Build based distro. So far the code isn’t on CPAN yet, follow its progress on GitHub.

I haven’t been feeling so well today, so I have been sitting around watching movies (which I own on DVD) on TV. Of course I can’t sit still that long without doing anything so Alien::Base saw a burst of activity today.

Along with testing I am also keeping an Alien::Base-based Alien::GSL (which provides the Gnu Scientific Library) in the examples folder. The big news today is that this example distro can now query the GNU FTP server, pick the newest version of the library. It then downloads, extracts and builds the library in a temporary folder. Finally it “installs” the library in a File::ShareDir directory in the Alien::GSL root/share directory. Even this isn’t as cool as how it does this:

It does it entirely from the Build.PL configuration!

It is my hope that most small/self-contained libraries can be wrapped in this simple way. In this way I hope to increase the number of Alien:: modules available on CPAN.

Of course its still needs much more functionality, lots more tests, and all the documentation. All of that is coming however, so keep watching!

Why would I use Tie::Array::CSV?

After (IMO) elegantly solving an SO question using my Tie::Array::CSV, I thought I might share it here to give you all an idea of when you might want to use it. This example is only reading the file, but remember that T::A::CSV gives you full row/column read/write access to the underlying CSV file in place.

The OP needed to find the column with a certain identifier which was 7 chars starting with a letter (in the example data below, this is the fouth column (i.e. index 3)). Then extract the number of repetitions of that identified in that column. Here was the solution that I posted.

#!/usr/bin/env perl

use strict;
use warnings;

use File::Temp;
use Tie::Array::CSV;
use List::MoreUtils qw/first_index/;
use Data::Dumper;

# this builds a temporary file from DATA
# normally you would just make $file the filename
my $file = File::Temp->new;
print $file <DATA>;
#########

tie my @csv, 'Tie::Array::CSV', $file;

#find column from data in first row
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]};
print "Using column: $colnum\n";

#extract that column
my @column = map { $csv[$_][$colnum] } (0..$#csv);

#build a hash of repetitions
my %reps;
$reps{$_}++ for @column;

print Dumper \%reps;

__DATA__
"ABCDEFGHIJK05","site","date1","ab96abc","date2"
"ABCDEFGHIJK05","site","date2","ab96abc","date2"
"ABCDEFGHIJK05","site","date1","cd98abc","date2"

(The OP gave one line of data, so I puffed it to 3, also to play blogs.perl.org’s width restrictions the data given here is rewritten. See the original post for the full stuff if you must.)

Of course I know you can do this with Text::CSV directly, but I like that it lets me think in terms of columns rather than objects and parsers and accessors.

My $0.02 on strict and the community

By now most people who would be reading my blog are aware of the kerfuffle going on about people being pushy about strict (and other Modern Perlisms).

As a relatively new Perler (my first scripts are dated 2009) I believe I have an underrepresented opinion on the matter. I was lucky to have had StackOverflow and the community around me as I was learning Perl. Someone, I don’t remember who or with what tone, told me that I should use strict and warnings on my code. Not knowing any better, I did.

Then Perl was easier. Simple as that.

I have learned a lot since then. I know when I need to no strict 'refs' or no warnings 'once'. Personally I wish these pragmas were default. In fact, I have had so little problem with Perl that I’m horrendous at the debugger; I really haven’t needed it. Of course I know that one of Perl’s best assets is its compatibility, and therefore strict/warnings is not default.

And yes, I get annoyed now when some new Perler asks on StackOverflow and didn’t use strict, but then I take a deep breath, remind myself I was there, right there, myself.

And yes, I get annoyed when people comment on my blog posts with self-righteous bull, but then I take a deep breath and realize that they might know lots more than me about a great many things.

Open source programming is an incredible social experiment. Many people are working together. We haven’t all met, we don’t even all speak the same language. Often we are not paid. But together we can make incredible products. Then usually we give them away, to help other people.

Think about how awesome that is. Then go explain why you do the things you do to a new Perler.

Should Perl have a `chomped` function?

Edit: orginially rchomp, but Aristotle’s suggestion of chomped is perfect!

brian d foy posed an interesting interview question: “What five things do you hate most about language X?” positing that an experienced user of X should know 5 things (s)he hates about it.

In my list is the return value of chomp. Yes I understand why it works as it does

print "chomped" if chomp $input;

but I find that use case happens far less often than the usual

chomp( my $input = <> );

It looks bad, and it is not intuitive, especially to the new user. Just today another one popped up on StackOverflow. This has got to be one of the most common questions on the site.

Wouldn’t it be great to have a chomp function that returns the chomped value or values? In the spirit of the new s///r flag I originally wanted to call it rchomp, Aristotle’s comment of chomped is my new favorite. You would use it like:

my $input = chomped <>;

Since we all love CPAN of course I could make a CPAN module for this, but no one would add a dependency on it just for one convenience function. I don’t expect large adoption of my Tie::Select for this reason, even thought I think it has reasons to be safer than the core’s SelectSaver in some rare circumstances.

So anyway, is chomped something that the community would want? Could it possibly be in CORE:: so that people might actually use it? I even see that a similar concept made an rfc for Perl6. Just daydreaming I guess, but oooh I do hate it.

The Case for Simplicity

Part of my design goal for Tie::Array::CSV was to be an elegant blend of tied objects making hard things easy both at the user and author (me) levels.

A few months back I announced that Tie::Array::CSV is now more efficient on row ops. Since then I have had a nagging thought; this change cost me elegance and simplicity.

To implement the deferred row operations, I made my row objects wait until their destructor to update the file. Sounds nice until you realize that you now have race conditions all over the place. So you hunt them down and store/update more internal data, always keeping track of what has been changed. A simple change became a big undertaking. As the project finished I couldn’t help but yearn for the simplicity of the original design goal.

Yesterday, in this staring match with myself, I finally blinked. I retrieved the old code, merged in a few of the newer niceties that I wanted to keep and moved the more convoluted deferred-row-op logic into a subclass.

Here I announce the release of Tie::Array::CSV version 0.05; featuring simplicity in the base class and deferred row operations in a subclass.

Its not the most efficient (read: fast) way to read CSV files and it doesn’t handle embedded newlines, but if you just want to act on a CSV file like a 2D Perl array (i.e. array of array references), give it a try.

Fork Tie::Array::CSV on GitHub

PS. Mithaldu, you can now pass a Text::CSV object (or subclass) to the constructor if you would like :)

Perl Data Language (PDL) 2.4.10 Release

Since the Perl Data Language (PDL) does not have a large presence in the Perl Blogosphere, I have the honor of reposting Pumpking Chris Marshall’s announcement of PDL 2.4.10.

For those of you who don’t know, PDL gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. For more information on PDL please visit its website at http://pdl.perl.org.

Chris’ release message is reposted below, the full text can be seen on the mailing list archive.

PDL-2.4.10 released

The PDL Development Team is pleased to announce the PDL-2.4.10 release of the Perl Data Language and the first PDF release of the PDL Book.

PDL-2.4.10 is the latest point release with more functionality, portability, and robustness than ever before, including:

  • POSIX threads support for all platforms
  • Auto parallelization of PDL threadloops
  • Support for PDLs larger than 2GiB
  • PDL Book draft release (PDF format)
  • Much, much, more…

As always, the source distribution will be available at a CPAN mirror near you within a couple of days. Our sf.net site has the source distribution and the PDL Book:

http://sourceforge.net/projects/pdl/files/PDL/2.4.10/PDL-Book-20120205.pdf/download

Windows binary PPM are available in the usual site, see “Get PDL” in the sidebar at http://pdl.perl.org (the PDL website for links, documentation and info for all things PDL).

The SciPDL-2.4.10 release for MacOS X systems will be announced when it is available.

Enjoy and Happy PDL-ing!

Chris Marshall for the PDL Development Team

The 'Perl Tutorial' Tutorial

The web is crowded with tutorials about Perl. Perl has improved over the years, bringing new features, safer constructs and clearer syntax. However the old tutorials still are read and learned by far too many new Perlers. This article is to help you be able to select your information source critically.

Some things to look for:

The name of the language is Perl

Yes that’s Perl not PERL. The community is very adamant about this, so if the tutorial says PERL you know that it is written by someone outside the Perl community.

The current Perl version is 5.14, the year is 2012

Many tutorials that were great for Perl 4, or even 5.6 are now woefully out of date. If there is a date on the tutorial older than about 2007 or Perl version 5.10, it will be missing many of the new features and may use older, more dangerous syntax.

Recommends using ‘strict’ and ‘warnings’

While the strict and warnings pragmas are not required, and there are still a few people who prefer not to use them, the vast majority of Perlers do. They help prevent typos and gotchas that are very easy for even the most experienced Perlers to accidently fall into.

A good tutorial site should at least mention that these represent a good safety net, but may skip including them on each example to save space. An even better site will include them in all longer code snippets.

Finally, you may see snippets which use a -w on the top line, this is a giveaway that the tutorial was probably good, but is now a little old (see above).

Use of ‘my’ variables

Along with use strict comes the requirement that variables are declared with some scope. This means that the first time a variable is introduced, it should come with a declaration of my, our, local or state.

my $phrase = 'Hello World';
our @names = ('Jim', 'Bob');

If this is not the case, I would move on to another tutorial. One exception is if the code is written as a “one-liner”, which will look like

perl -e 'some code here'

in which these declarations are not often used. New Perlers may want to learn some basics before attempting one-liners however.

Modern use of the ‘open’ command

The open command has seen some big changes over the years, and while the old use will still work, the modern syntax is much better. Since Perl is often used to work on text files, the open command is in almost every tutorial. As such, the treatment of this one command can serve as a useful litmus test as to the age and “goodness” of a tutorial.

A huge red-flag of an old tutorial looks like this

open HANDLE, '<filename';

rather that a modern

open my $handle, '<', 'filename' or die "Cannot open filename: $!";

This will take some explanation. First modern Perl lets one assign the filehandle to a variable (i.e. $handle, which may need the my declarator if it has not been declared previously), which is highly recommended.

Separating the open type, < or > (for read and write, respectively) from the file name is now recommended as well. This is called “the 3-arg open” and should be used whenever possible.

Checking to make sure that open succeeded is also recommended. The snippet above does this with the or die ... conditional. Another simple way to accomplish this task is to use autodie near the top of the script. Would you want your code to work on a file that didn’t open correctly? I doubt it.

A good tutorial should do, or at least mention, all of the previous concerns. If they lax on checking success on opening after mentioning it, this is probably acceptable (from a purely space-saving perspective). If not, move on.

Preach code reuse

Perl is good at munging text, however, for all but simple cases, you might want to use standard modules when appropriate. A new user should probably know how to parse out a basic line of comma-separated text, and a tutorial can show you how to do it manually. That said, a very good tutorial will mention that this is very fragile and that you should, in practice use the Text::CSV module. Same goes for HTML, XML, JSON and other formats.

Finally while code-reuse is a good idea, it can only as good as the code that is being reused. The CGI.pm module helped put Perl on the map, but its days are numbered. A tutorial that spends too much time on it is probably too old to trust.

In Conclusion

While none of these guidelines is absolute, I think you the reader will find that most tutorials will usually fall into very good or very bad/old. Hopefully this helps you find those good Perl tutorials, and steer clear of the others.

Gabor Szabo, himself the author of a new series of Perl tutorials, reminds me that there is a project aimed at screening and recommending tutorials. You might want to start your search for a modern tutorial there: Perl Tutorial Hub.

P.S.

To my fellow Perlers, you are encouraged to add to this list in the comments, as well as suggest some good tutorials. This post was made partially to be linked to posters on Stack Overflow and others when bad code from old tutorials shows up (you know what I mean).

About Joel Berger

user-pic As I delve into the deeper Perl magic I like to share what I can.