Extensible Maintainable Subroutines and Methods

I've been thinking about the way I write Perl subroutines and methods as compared to some other Perl programmers, and I've decided to write a post about it.

To make my subroutines and methods more reusable, extensible, and maintainable, I make them receive a hash reference as their only argument and return a hash reference. This has multiple benefits.

First, the parameters going into the subroutine are named when passed in, and these names are used to identify them inside the subroutine. This means that new parameters can be added with keys in the hash, a very minimal change. The same is true of the return hash. Using keys in a hash is far simpler to maintain than using positions in a list. I've spent enough time wrestling with indexed parameters in subroutines that I never use lists of parameters in code that I expect to use more that once (which is basically all of it).

Second, a hash reference is passed in instead of a hash because the reference can be tested using the ref function to ensure that is is actually a HASH ref. Assigning @_ to a hash is not as robust. The same is true of the returned hash reference.

Here is a brief example of a subroutine that receives and returns references to hashes of named parameters:

#!/usr/bin/env perl
use strict;
use warnings;

use feature qw(say); # perl 5.10+

# CPAN modules
use Data::Dumper;
$Data::Dumper::Purity = 1;

my $sendRef = { string => 'Wow! Perl is so awesome, right?' };
my $returnRef = no_punct( $sendRef );
unless ( defined $returnRef and ref $returnRef eq 'HASH' ) {
  say 'no_punct failed to return a HASH ref!';
  exit;
}
print 'with punctuation: '.Dumper( $sendRef );
print 'punctuation removed: '.Dumper( $returnRef );


=head1 Subroutines Begin

=cut

=head2 no_punct( \%params )

Processes each scalar value of a copy of %params and removes any punctuation in it.

Returns a hash reference.

=cut

sub no_punct {
  my $paramsRef = shift;
  my %params = ();
  %params = %$paramsRef if defined $paramsRef and ref $paramsRef eq 'HASH';

  my %return = %params;
  foreach my $k ( keys %return ) {
    if ( ref $return{ $k } eq 'HASH' ) {
      $return{ $k } = no_punct( $return{ $k } ); # recursive call
    } elsif ( ref $return{ $k } eq 'ARRAY' ) {
      foreach my $i ( 0 .. ( scalar @{ $return{ $k } - 1 ) ) {
        $return{ $k }[ $i ] =~ s/[^\w\s\d]//g;
      }
    } else {
      next unless ref $return{ $k } eq 'SCALAR';
      $return{ $k } =~ s/[^\w\s\d]//g;
    } 
  }

  return \%return;
} 

And that's the basics of it. What do you think? Comments are very welcome.

8 Comments

I generally agree. Some comments:

1) Some people do not like using hash because typos cannot be caught at compile time. In my experience, I don't find this to be a big problem. Besides, functions can check for unknown arguments.

2) I personally prefer accepting hash instead of hashref, because I don't like typing the extra {}'s :)

3) What's even better is to be able to switch between accepting hash/hashref and positional. I develop Perinci::Sub::Wrapper to accomplish this (among other things).

4) As for return values, I've standardized on what I call "enveloped response", which is modeled after HTTP response: [3-digit-status, message-string, actual-result]. This allows returning status and data in one go, plus translates rather straightforwardly into the web. Read more if you are interested: Rinci::function.

Certainly with any subroutine taking a lot of arguments, that's the direction I go in, although I've never been one to do it universally. (Although I did recently write a network-level API that passes json objects around, which is the same thing once you've written a wrapper for it.)

I know if I were to be doing it universally in Perl I'd be pretty quick to write a little Devel::Declare module to hide away all of the scaffolding. Assuming at least, that someone hasn't already done this.

I sometimes use this technique although I've always been unconfortable with it for several reason:

- This is stringly typed code, but well, it's Perl so why should we care :)

- It can lead to parameters inflation. Facing a new problem to solve, the temptation to 'add yet another parameter in there' is big. Methods who accept a long list of parameters are a pain to unit test and to fix.

- It's usually a sign of bad design. When you feel the need to pass a hash of parameters to a method, stop for a second and think twice. Maybe you're missing a concept there. Maybe what you really want is a 'Doer' object that behaves according to its characteristics. Using Moose or even just plain Perl objects, maybe you could save yourself time in the future. Or maybe your method doesn't belong to the right class.

- Using plain hashes to carry stuff around is just wrong. Perl has got a fantastic object mechanism, specially since Moose. Not using it by sticking to using plain hashes - because it's 'More extensible' - is just going to make your life and the life of your fellow programmers more difficult in the future.

This is stringly typed code, but well, it's Perl so why should we care :)

Well, if Perl5 had had real named arguments like Python, the hash workaround would not have been necessary. But it's Perl so most deficiency has a workaround :)

Using plain hashes to carry stuff around is just wrong. Perl has got a fantastic object mechanism, specially since Moose. Not using it by sticking to using plain hashes - because it's 'More extensible' - is just going to make your life and the life of your fellow programmers more difficult in the future.

On the other hand, transporting objects across network are relatively more tricky, since objects are almost always stateful.

Look at it this way, hashes *are* objects (think blessed hashref, JSON).

Well, if Perl5 had had real named arguments like Python, the hash workaround would not have been necessary. But it's Perl so most deficiency has a workaround :)

True, there's plenty of modules up there to turn Perl code into clunky Python. Not sure what the benefit is though, except giving you the false impression you're writing good code, just because you pass everything through Param::Validate. !!flame!! If you cannot write decent code without using an ever growing collection of named arguments, then why not write Python directly? !!flame!!

hashes *are* objects !? You're under javascript people's influence :)

I also like Plain Hash + JSON for communications, but not every class in a system needs go through the network.

Here is a brief example of a subroutine that receives and returns references to hashes of named parameters

I wouldn't really say that the example sub receives a list of named parameters. The names seem entirely ignored. You're calling it with a named parameter "string" but it would work identically if the named parameter were called "quux".

By the way, the code doesn't compile (missing closing brace for scalar @{ $return{ $k } }), and even when that is fixed it doesn't work on your example input (problem with your next unless).

These problems would be much easier to spot if it weren't for all the deeply nested brackets and code implementing recursive crawls over array and hash reference.

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;

sub no_punct
{
if (defined wantarray)
{
s/[^\w\s\d]//g for my @tmp = @_;
return @tmp;
}

s/[^\w\s\d]//g for @_;
}

say for no_punct("Hello!", "World?");

if( ! defined $paramsRef || ref( $paramsRef ) ne 'HASH' ){
  croak( sprintf( "sub %s in file %s at %s: invalid parameter hash\n", __SUB__, __FILE__, __LINE__ ));
}

If you're going to validate the parameters, throw an exception.

I alternate between the two styles. I use hashrefs for big, monolithic functions. I use a single parameter for small functions. And a list of parameters for list functions. (I bet you thought that was going to be "medium functions".) Ok, that's not two styles.

When I say "monolithic", I don't mean "everything happens in a single actual sub {...} block". I mean "this function does lots of things, and may call dozens of functions underneath." One thing I've taken to in my main API is taking a hash ref as passed in from the framework above me, and passing it along as we go down five or ten stack frames into sub-subroutines that do the real work. And having all of those share the same hash ref as input.

This actually allows the functions to "talk" to each other by putting more things into the hash ref. Or to pass information on from stack frame 5 to stack frame 8 without 6 or 7 caring about those parameters.

Though I'm not sure "no_punct" would be one of those "monolithic" functions that I'd want running roughshod over my entire hash :-)

Leave a comment

About Andrew Proper

user-pic I've been writing in Perl since my first Perl job in 1999. I must say, Perl has been good to me and is my favourite programming language. Currently I'm an Applications Developer and Perl is the primary language I use.