TWC 217: Big and Boundless in The Matrix

In which I must change a conference talk.

TWC Task #1 - Sorted Matrix

Task:

Given a n x n matrix where n >= 2, find the third smallest element.

Observations:

  • 2 x 2 == 4, so we don't have to handle the condition of there being less than three elements.

  • We will either need to flatten the matrix to a list, or keep track of intermediate three smallest for each matrix row.

  • Unlike TWC 205.1, we are not removing duplicates.

  • We could optimize for clarity/readability, or brevity, or large-matrix performance.

Perl

sub task1 ( $m ) {
    my ( $first, $second, $third, @rest )
        = sort { $a <=> $b } map { $_->@* } @{$m};

    return $third;
}

Raku

sub task1 ( @m ) { @m[*;*].sort.[2] }

Note that the Raku code (on GitHub) has twice as many tests as the Perl code, copying the numeric tests and translating them to alpha. I am using Raku's default .sort, which is based on the generic cmp and so sorts numbers and strings correctly based on type.

There are two other approaches I wanted to explore:

  1. Extending the concept that gives us O(n) performance for .max and .min, we can calculate top N or bottom N in a single pass, without sorting. (I keep intending to write a module to do just that). The benefits would be: using much less space than a copy-sort, and often less time, and the availability of running top or bottom intermediate results (like [\+] lines() to get running totals). However, the larger the N in top N, the worse the performance will be, and the more complex the code will be, and the more time it takes to verify the logic is correct (or to develop the test cases that expose all the mis-coding you might have done). I could not devote enough time this week to pursue this.

  2. The lowest 3 of each row can be computed via many smaller .sorts, greatly reducing memory pressure and also reducing the number of comparisons by half, while still keeping the code simple.

Given a 1000x1000 matrix of 1s, looped 100 times, this approach ran in 1/2 the time that the full flattening sort took.

sub task1_alternate ( @m ) {
    my @top;
    @top.append( .sort.head(3) ) for @m;
    return @top.sort.head(3).tail;
}
# task1 => 209.35s, task1_alternate => 98.74s, x => 2.12

TWC Task #2 - Max Number

Task:

Given a list of positive integers, find the highest value possible from concatenating those integers.

Observations:

  • Solvable via permutation.

  • At first, this looks efficiently solvable by just reverse-sorting the integers by their stringified values. This works with same-number-of-digit integers, but fails with (1, 10) => 110; the nothingness after the 1 causes it to sort before the 0, and we need the opposite sort for our reversed-sort to work.

  • Temporarily padding shorter integers with a character of codepoint > digit-code-points looks like a good fix for the above problem.
    (Spoiler: it wasn't.)

Raku

First, let's look at the roads not taken.

Permuting: O(n!) catastrophe

sub task2 ( @ns --> UInt ) {
    return @ns.permutations.map( + *.join ).max;
}

This code is simple, and always returns the correct answer. Eventually!
Changing the number of elements in @ns from 9 to 11 caused a 100x increase in runtime.
20 elements will not finish before Sol consumes the Earth.

Padding: Conditionally Correct

sub task2 ( @ns --> Str ) {
    my $max_length = @ns».chars.max;

    sub padded ( $s ) { $s ~ ( '~' x ($max_length - $s.chars) ) }

    return @ns.sort(&padded).reverse.join;
}

The ~ tilde character has a codepoint higher than any ASCII digit, so this code fixes the (1, 10) => 110 testcase mentioned above. If fails a new testcase: (10, 100) => 10100. It turns out that what character you pad with if affected by what you are comparing to, which changes on each comparison, so we must abandon this strategy. (Well, I think we must.)

Playing with this approach, I saw that you can fool yourself into thinking you have a good solution, when you have actually create a tie with no tie-breaker to the ordering, and the test passes just because the input was in the correct order to start with! We must also test (100, 10) => 10100.

Pairwise: Flawless Victory

As I considered the problem in the shower, I realized that the answer to the different-length orderings could be found by numerically comparing both "ab" and "ba" string-combinations. Easily written as @^ns.sort($^b ~ $^a <=> $^a ~ $^b).join, I named the less-clear part to ease the burden on the reader.

sub numerically_highest_concatenation ( $a, $b --> Order ) {
    $b ~ $a <=> $a ~ $b
}
sub task2 ( @ns --> Str ) {
    return @ns.sort(&numerically_highest_concatenation).join;
}

Conference talk: Sorting Whatever* in $LANG

By the way, at the Perl and Raku Conference in Toronto, I am giving a talk on Sorting.

In the Raku part of the talk, I will show off the superior arity-1 form of .sort (like .sort(*.date_of_last_purchase)), and had planned to show the only two use cases that remained for the old $a <=> $b arity-2 form from Perl. Because of this TWC task, I will be adding a third case, where pairings must be compared to determine ordering.

Perl

use v5.36;
sub task2 ( @integers ) {
    return join '', sort { "$b$a" <=> "$a$b" } @integers;
}

Note that, where I increased clarity by using a named sort subroutine in Raku, here I named the parameter, and used double-quotes instead of . to stress the concatenation aspect while removing any doubts about operator precedence (. vs <=>).


dash dot dash dash dash dot dash dash dash dash dot dot
dash dot dash dash dash dot dash dash dash dash dot dot
dash dot dash dash dash dot dash dash dash dash dot dot
-- "YYZ", in 10⁄8 time, by Rush audio video

Looking forward to seeing you all in Toronto!

Leave a comment

About Bruce Gray

user-pic "Util" on IRC and PerlMonks. Frequent speaker on Perl and Raku, but infrequent blogger.