Comma quibbling in Perl

Sinan posted his answer to a Eric Lippert's comma quibbling exercise. There are a couple of Perl solutions in Rosetta Code.

Eric says:

I am particularly interested in solutions which make the semantics of the code very clear to the code maintainer.

Sinan's answer works. He handles the cases where the array reference has zero, one, and and more than one elements. In his solution for more than one element, he uses a special case where an array slice will return zero elements when the "high" number of the range is less than the "low" number and the special case where a join with one list item returns only that item. The collapses two cases of the problem into one:

sub comma_quibbling {
    my $x = shift;      # an array ref
    my $n = @$x;

    $n or return '';
    $n == 1 and return $x->[0];

    return join(' and ' =>
        # array slice and range operator
        join(', ' => @$x[0 .. ($n - 2)]),
        $x->[-1],
    );
}

The Perl solution in Rosetta Code does the same sort of thing, but encodes all of the cases in a conditional operator (this one takes an a list instead of an array ref):

sub comma_quibbling(@) {
    return "{$_}" for
        @_ < 2 ? "@_" :
        join(', ', @_[0..@_-2]) . ' and ' . $_[-1];
}

Those work, but I think they are a bit too clever. I think most work-a-day programmers would have trouble recognizing the specification from those solutions. Until we have to optimize the problem, I rather translate the specification directly:

sub comma_quibbling {
    my( $x ) = @_;

	   if( @$x == 0 ) { return '' }
	elsif( @$x == 1 ) { return $x->[0] }
	elsif( @$x == 2 ) { return join ' and ', @$x }
	elsif( @$x >  2 ) { return
		join ', ',
		@$x[0 .. $#$x - 2],
		join ' and ', @$x[-2, -1];
		}
	}

I don't expect any of these to be much faster than any of the others. They are all doing the same thing presented a bit differently.

13 Comments

As a stylistic matter i'd strongly recommend doing `my @x = @$x;` at the start. All that needless dereferencing makes it that unnecessary bit more ugly.

That said i prefer the last example over the others, but am confused why you deal with the elses, when you could just make it a stack of returns with postfix-if conditions. (Even if you don't like postfix-if, you could just drop the els bits and lose no functionality.) Also, the last example seems to have no default case?

I don't know if this is interesting, but this is Actual Code from my transit-schedule-presentation project (not just written for the exercise):

sub _joinseries_with_x {
    my $and    = shift;
    my @things = @_;
    return $things[0] if @things == 1;
    return "$things[0] and $things[1]" if @things == 2;
    my $final = pop @things;
    return ( join( q{, }, @things ) . " $and $final" );
}

sub joinseries {
    return _joinseries_with_x( 'and', @_ );
}

sub joinseries_ampersand {
    return _joinseries_with_x( '&', @_ );
}

If I were going to handle the empty list as a possibility, I probably would have it croak rather than just return nothing.

And now I reread it and found the bug!

return "$things[0] and $things[1]" if @things == 2;

should have been

return "$things[0] $and $things[1]" if @things == 2;

Ha.

Thank you for pointing to this. I have had a lot of fun playing around with it.

First I thought all cases could collapse along this:
- join elements with comma
- replace last comma with "and"
(See the Python example on rosetta for comparison.)
But I think it is buggy because the last element can contain commas, too.

Then I went back to brian's if-elsif solution and quite liked it, especially the formatting.

I played around with some given-when which is very readable, too.

In the meantime, Aaron posted his solution. I like it a lot, especially after some more formatting. Sadly I cannot get code in the comments nicely formatted..
To pop the last element (and getting rid of the array subscripting) seems favourable.

Should solutions also contain the curlies {ABC and DEF} or are they irrelevant?

sub comma_quibbling {
   Moose::Util::english_list(@{+shift});
}

(It’s ugly in several ways to make this expect an arrayref so I won’t.)

sub comma_quibbling {
    return '' if not @_;
my $last = pop @_; return $last if not @_;
my $enumerated = join ', ', @_; return "$enumerated and $last"; }

I should say I find code massively irritating that explicitly lists every effectively possible condition and then repeats some code across several of the branches, esp. when the repeat must be written in a slightly different way each time. Don’t make me very carefully compare identical-in-intent code to figure out that it’s also identical in effect. Do your job. It’s almost always possible to write it without repeating yourself and without being too clever by half about factoring out the repetition – but you have to work at it and not stop at the first way of writing it that pops into your head. Or even the second.

(That is, btw, the generic “you” and not directly aimed at you, brian. The solution you proposed is not egregious (though I still find it irritating) – but the threshold for that is very low indeed. So I avoid writing code like that on principle.)

Eliminating returns is a question of the overall structure, to me. I find code most readable that doesn’t have any nesting in its control flow – just straight line execution with “if you got here and condition X holds, do this and bail out” exit points. Then you can just read the code from top to bottom, and the conditions simply compound, like a narrowing funnel, so it’s easy to understand how you reach any given point, and what happens in which sequence. (Sometimes I actually factor pieces of code out into a sub or method just so I get to use return to write it like this.) In such a case I am very willing, glad in fact, to accept some amount of repetitive control flow structure at each exit point – you don’t have any reason to be comparing the structure of one exit points to that of another to make sure they are the same, so the repetition has no cognitive cost.

I don’t try to write code for tightness at any cost. That is what I meant by not stopping at the first way of writing it that comes into one’s head – I’m not satisfied with code that is simply “mathematically” well factored, it has to explain itself really clearly too. That may well be more verbose than the shortest or simplest solution possible; not all verbosity is bad, and not all of it even costs anything, as with the repetitive returns above. (But I do sometimes struggle between clarity to an experienced reader, where judicious use of implicitness can make code really concise, and to a novice, who might not be able to follow the exploitation of language features in such code. But I struggle because I feel the novice-friendly code is actually less clear – not because it’s insufficiently golfed and clever.)

(As for mitigating repetitions by way of aligning things: I consider that a fallback of last resort. Aligned structures are very fragile in the face of receiving maintenance. If new requirements come in that only affect some case or other, where they can’t be fit into the existing structure easily, alignment goes out the window very quickly, yet the repeated logic defaults to staying. This is bad. In this particular example you get away with it fine, because the repetition is trivial and the problem fairly circumscribed and static, so there’s no great danger. But I’ll rely on alignment only if I can’t think of anything better.)

It's all clear now why Moose has such a heavy startup :-)

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).