Sorting Subroutine Results

The Perl sort built-in is mostly (at least by me) called as sort LIST or sort BLOCK LIST. But there is a third way to call it: sort SUBROUTINE LIST, which actually appears first in the documentation.

This is not a blog entry about using the sort SUBROUTINE LIST form of sort. It is more about the need to be aware of this form when writing (or trying to write) the sort LIST form.

Consider the following situation: you have a subroutine foo() which returns an un-ordered list. You need that list sorted. Perl has a sort built-in, so your (or at least my) first reaction is to write my @sorted = sort foo();, run it, and then wonder why @sorted is empty.

The problem, of course, is that Perl parses this as sort SUBROUTINE LIST with the SUBROUTINE being foo and the LIST being everything after foo. The contents of the parentheses (if any) are not passed as arguments to foo(), but are consumed by the sort. Subroutine foo() gets called only to order pairs of items in the LIST.

If you actually want to sort the list returned by foo(), you have to persuade Perl not to parse sort foo() as sort SUBROUTINE LIST. The documentation contains the words Warning: syntactical care is required when sorting the list returned from a function, and provides ways to make this happen. They are, basically,

  • Provide a sort block, e.g. sort { $a cmp $b } foo()
  • Use a unary plus, e.g. sort +foo()
  • Call the function with an ampersand, e.g. sort &foo()
  • Call sort as a function, e.g. sort( foo() )

Which of these you choose is largely a matter of style. I believe a sort block imposes a performance penalty, but whether this is significant depends on the application.

5 Comments

There is one more option besides the ones you listed:

  • Put the subroutine call in a do block, e.g. sort do { foo() }

And in fact I prefer this option because I feel that it makes it clearer that the code has been written this way deliberately. With the unary plus and the ampersand I fear a future maintainer looking at the respective syntactic dingus and instinctively considering it superfluous.

The explicit sort block seems even worse: not only does explicitly specifying the default sort order seem even more likely to appear superfluous at a glance, but additionally, it requires slowing down to read the comparator expression to make sure it just specifies the default sort order.

However, as a sidenote, a performance penalty is not among the reasons against chosing the sort block form: perl recognises and optimises the simplest comparators (i.e. { $a OP $b } and { $b OP $a }, where OP may only be either cmp or <=>) and uses internal comparators for all of them. You can see this by looking at the generated optree:

perl -MO=Concise -e 'sort @foo'
perl -MO=Concise -e 'sort { $a cmp $b } @foo'
perl -MO=Concise -e 'sort { ""; $a cmp $b } @foo'

Here there is no real difference between the first two examples, but a big one with the last – rather than the first example being different and both of following ones similar, as you might expect without this optimisation. You can also take the middle example but flip the order of operands and/or switch the operator to <=> to demonstrate that these particular changes also do not introduce an explicit comparator block at the optree level – they just flip the NUM and/or DESC flag(s) on the sort node.

Most of the time I use sort BLOCK LIST. I recall having written some rather long blocks that probably should have been refactored as functions. Maybe the explanation of needing & is why I never did. I'll try to remember that going forward...

Leave a comment

About Tom Wyant

user-pic I blog about Perl.