Sorting Subroutine Results

By Tom Wyant on July 20, 2022 3:26 PM

The Perl sort built-in is mostly (at least by me) called as sort LIST or sort BLOCK LIST. But there is a third way to call it: sort SUBROUTINE LIST, which actually appears first in the documentation.

This is not a blog entry about using the sort SUBROUTINE LIST form of sort. It is more about the need to be aware of this form when writing (or trying to write) the sort LIST form.

Consider the following situation: you have a subroutine foo() which returns an un-ordered list. You need that list sorted. Perl has a sort built-in, so your (or at least my) first reaction is to write my @sorted = sort foo();, run it, and then wonder why @sorted is empty.

The problem, of course, is that Perl parses this as sort SUBROUTINE LIST with the SUBROUTINE being foo and the LIST being everything after foo. The contents of the parentheses (if any) are not passed as arguments to foo(), but are consumed by the sort. Subroutine foo() gets called only to order pairs of items in the LIST.

If you actually want to sort the list returned by foo(), you have to persuade Perl not to parse sort foo() as sort SUBROUTINE LIST. The documentation contains the words Warning: syntactical care is required when sorting the list returned from a function, and provides ways to make this happen. They are, basically,

Provide a sort block, e.g. sort { $a cmp $b } foo()
Use a unary plus, e.g. sort +foo()
Call the function with an ampersand, e.g. sort &foo()
Call sort as a function, e.g. sort( foo() )

Which of these you choose is largely a matter of style. I believe a sort block imposes a performance penalty, but whether this is significant depends on the application.

5 comments

5 Comments

Aristotle | July 21, 2022 12:26 AM | Reply

There is one more option besides the ones you listed:

Put the subroutine call in a do block, e.g. sort do { foo() }

And in fact I prefer this option because I feel that it makes it clearer that the code has been written this way deliberately. With the unary plus and the ampersand I fear a future maintainer looking at the respective syntactic dingus and instinctively considering it superfluous.

The explicit sort block seems even worse: not only does explicitly specifying the default sort order seem even more likely to appear superfluous at a glance, but additionally, it requires slowing down to read the comparator expression to make sure it just specifies the default sort order.

However, as a sidenote, a performance penalty is not among the reasons against chosing the sort block form: perl recognises and optimises the simplest comparators (i.e. { $a OP $b } and { $b OP $a }, where OP may only be either cmp or <=>) and uses internal comparators for all of them. You can see this by looking at the generated optree:

perl -MO=Concise -e 'sort @foo'
perl -MO=Concise -e 'sort { $a cmp $b } @foo'
perl -MO=Concise -e 'sort { ""; $a cmp $b } @foo'

Here there is no real difference between the first two examples, but a big one with the last – rather than the first example being different and both of following ones similar, as you might expect without this optimisation. You can also take the middle example but flip the order of operands and/or switch the operator to <=> to demonstrate that these particular changes also do not introduce an explicit comparator block at the optree level – they just flip the NUM and/or DESC flag(s) on the sort node.

Tom Wyant | July 21, 2022 2:43 AM | Reply

My list came straight from perldoc -f sort, so maybe a pull request on perlfunc.pod is in order?

Aristotle | July 21, 2022 1:09 PM | Reply

Agreed.

Matthew Persico | July 22, 2022 12:36 PM | Reply

Most of the time I use sort BLOCK LIST. I recall having written some rather long blocks that probably should have been refactored as functions. Maybe the explanation of needing & is why I never did. I'll try to remember that going forward...

Tom Wyant replied to comment from Matthew Persico | July 22, 2022 4:28 PM | Reply

There are two cases that the compiler needs to distinguish between:

Specifying a user-provided function as a comparator for the sort, and
Calling a function and then sorting the list returned by that function;

The Perl compiler assumes the former. This blog post is intended to address making Perl do the latter.

If the former is what you want, leave off the ampersand. But also read the sort documentation carefully, because how the comparator is dispatched depends on whether it has a prototype or not. You probably do not want a prototype. If you have one, and it is ($$), the two values to compare are passed as arguments to your subroutine. If you do not have a prototype, the two values to compare show up in globals $a and $b, just as they do in a sort block.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Tom Wyant

I blog about Perl.

More info »

Tom Wyant