Three Sort Functions

Everyone (I’d have to assume) has written the two basic sort functions

  # sort strings case-insensitively
  sub insensitive {
    return uc $a cmp uc $b || $a cmp $b;
  }


and

  # sort values numerically
  sub numeric {
    return $a <=> $b;
  }

Here are three fun—and hopefully useful—sort functions.

This first one sorts strings with numeric suffixes (such as room numbers like “White 102”) first by the string part, then numerically by the suffix. The function assumes the data has been vetted; i.e. matches $rgx_strnum. Obviously you may need to tweak $rgx_strnum to meet your specific needs; you can also add calls to uc to sort the string part case-insensitively.

  our $rgx_strnum = qr/^([A-Za-z]+)\s*(\d+)$/;
  sub strnum {
    my ($aS, $aN) = $a =~ $rgx_strnum;
    my ($bS, $bN) = $b =~ $rgx_strnum;
    return $aS cmp $bS || $aN <=> $bN;
  }

This function sort strings ASCIIbetically except blank strings sort to the end—I am always surprised how often this comes up. As above, you can add calls to uc for a case-insensitive blanks-last sort.

  sub blanks_last {
    return [[$a cmp $b, -1], [1, 0]]->[$a eq ""][$b eq ""];
  }

Finally, this function sorts strings that consist of dot-delimited numbers (like IP addresses or RCS version numbers). Again, the function assumes the data has been vetted and all values have the same number of fields (i.e. don’t try to sort data that includes both IP addresses and RCS version numbers).

  sub dotted_nums {
    my @bits = split m/\./, "$a.$b";
    return (map { $bits[$_] <=> $bits[$_ + @bits/2] || () }
                (0 .. @bits/2-1))[0] || 0;
  }

3 Comments

Most of these sort functions should not be used directly, but instead should be recoded to use the Shwarzian Transform, or the Guttman Rosler Transform.

Here is an example of a complex sort function that is used in perl core (in modern perls) to sort manifest files into a sane order, which uses the ST for efficiency.

# Try to get a sane sort. case insensitive, more or less
# sorted such that path components are compared independently,
# and so that lib/Foo/Bar sorts before lib/Foo-Alpha/Baz
# and so that lib/Foo/Bar.pm sorts before lib/Foo/Bar/Alpha.pm
# and so that configure and Configure sort together.
sub sort_manifest {
  return
  # case insensitive sorting of directory components independently.
  map { $_->[0] } # extract the full line
  sort {
    $a->[1] cmp $b->[1] || # sort in order of munged filename
    $a->[0] cmp $b->[0]    # then by the exact text in full line
   }
   map {
     # split out the filename and the description
     my ($f) = split /\s+/, $_, 2;
     # lc the filename so Configure and configure
     # sort together in the list
     my $m= lc $f; # $m for munged
     # replace slashes by nulls, this makes short
     # directory names sort before
     # longer ones, such as "foo/" sorting before "foo-bar/"
     $m =~ s!/!\0!g;
     # replace the extension (only one) by 
     # null null extension.
     # this puts any foo/blah.ext before any 
     # files in foo/blah/
     $m =~ s!(\.[^.]+\z)!\0\0$1!;
     # return the original string, 
     # and the munged filename
     [ $_, $m ];
  } @_;
}

Do not miss Sort::Key. It’s faster and leaves the code more readable than if you try to write a faster sort in pure Perl.

Leave a comment

About morandimus

user-pic My real name is Jeremy Holland. I've been a programmer for 25 years, using primarily Perl since 2000.