## TWC 205: Exclusive Third Or First

In which we ponder the highest bit, and find a much faster Max_XOR.

# TWC Task #1 - Third Highest

Given an array of integers, return the Third Highest if found, otherwise return the maximum.

## Observations:

• The last example shows that duplicate values are to be suppressed; `task1(5,4,4,3)` should return `3`.

• The very efficient single-pass algorithm for third-highest-maximum is very inefficient, w.r.t. programmer time, so I just sorted.

## Perl

``````use v5.36;
use List::Util qw<uniq>;
sub task1 ( \$ns ) {
my ( \$x1, \$x2, \$x3 ) = sort { \$b <=> \$a } uniq @{\$ns};
return \$x3 // \$x1;
}
``````

## Raku

``````sub task1 ( @ns ) {
(.elems == 3 ?? .head !! .tail) given @ns.sort.squish.tail(3);
}
``````
• `.squish` is more efficient than `.unique`, and can be used as an exact replacement when the list is already sorted.

• `given` aliases the RHS expression into `\$_`, allowing the concise (`elems`,`head`,`tail`) method calls.

# TWC Task #2 - Maximum XOR

Given an array of integers, find the highest value obtained by XORing any two distinct elements.

## Perl

``````use v5.36;
use ntheory    qw<forcomb>;
use List::Util qq<max>;
sub task2 ( @ns ) {
my \$r = 0;
forcomb {
\$r = max( \$r, \$ns[\$_[0]] ^ \$ns[\$_[1]] )
} @ns, 2;
return \$r;
}
``````

## Raku

``````sub task2 { @_.unique.combinations(2).map({ [+^] .list }).max }
``````

`[+^]` is the "reduction" form of the `+^` XOR operator; it XORs all the things on its right, which in this case are both of the elements in the two-element list passed to `map`.

That simple `.combinations(2)` works fine, with it's `N*(N-1)/2` performance, until the input array gets large.

With a bit of analysis, we can go much faster!

``````sub task2_fast ( @ns ) {
sub hi-bit ( UInt \$n ) { \$n.log2.floor }

my %grouped = @ns.unique.classify(&hi-bit);

my (\$top_group, @lesser_groups) = %grouped.sort(-*.key)
.map(*.value);

if !@lesser_groups {
my \$removes_hi-bit = 1 +< hi-bit(\$top_group[0]);
return task2_fast( \$top_group.list »-» \$removes_hi-bit );
}
else {
return [max] @lesser_groups.map: {
(\$top_group.list X+^ .list).max;
}
}
}
``````

### What we know:

• XOR on single bits:

• 0^0 == 0
• 0^1 == 1
• 1^0 == 1
• 1^1 == 0
• XOR is commutative (order does not matter): a^b == b^a

• So, we can use `.combinations(2)` instead of `@ns Xop @ns` Cartesian product.
• a^a==0, which is the minimum possible result, and is only achievable with a^a.

• `x^y == 0` implies `x==y`
• So, `.unique` is safe (and helpful); duplicate elements XOR to zero.
• Ack! Corner case not handled: @ns.elems > 1, but @ns.unique.elems == 1.
• Raku has 3 XOR ops:

• `xor` short-circuiting boolean XOR, low precedence
• `^^` short-circuiting boolean XOR, high precedence
• `+^` numeric bitwise XOR (which is what this task needs)
• The highest binary bit set in an integer can be found in many ways in C and assembler. Traditionally, the rightmost bit ("ones" position) is `0`, next to the left ("twos" position) is `1`, next ("fours" position) is `2`, and so on.

• In Perl, `floor(log(\$N)/log(2))` works, where `floor` is from the `POSIX` module, or from the new `Perl v5.36` "builtin".
• In Raku, this is easily done with `.log2.floor`. e.g. the highest bit in 3 (0b11) is `1`, and the highest bit in 3 (0b100) is `2`:

``````\$ raku -e 'for sort classify *.log2.floor, ^1024 {
say join "\t", .key, .value.minmax.gist,
.value.minmax.bounds.fmt("0b%b", "..")
}'
Hi-Bit   RangeInDecimal   RangeInBinary
-Inf   0..0             0b0..0b0
0     1..1             0b1..0b1
1     2..3            0b10..0b11
2     4..7           0b100..0b111
3     8..15         0b1000..0b1111
4    16..31        0b10000..0b11111
5    32..63       0b100000..0b111111
6    64..127     0b1000000..0b1111111
7   128..255    0b10000000..0b11111111
8   256..511   0b100000000..0b111111111
9   512..1023 0b1000000000..0b1111111111
``````
• Even if all we know about two integers is the highest bit set in each, we still know something about their XOR. Given `A > B`:

• If `A` and `B` have different highest bits, `A` has the higher of the two highest bits, and `A XOR B` will have that highest bit from A. Further, `(A XOR B)` will always be greater than `B`.
• Otherwise, `A` and `B` have the same highest bit, so `A XOR B` will clear that bit. Further, `(A XOR B)` will always be less than `A`.

We will see concrete examples of these "different highest bits" and "same highest bits" later. For now, we posit that some benefit will be gained if we group (via `.classify`) the input array by the highest bit set.

``````    sub hi-bit ( UInt \$n ) { \$n.log2.floor }
my %grouped = @ns.unique.classify(&hi-bit);
``````

Now if we only had a way to re-combine all these groups, that guaranteed we did not miss any combinations of the original `@ns` elements...

### Useful technique from combinatorics:

(or, The Whole Is Always Equal To The Sum Of Its Parts, But Only When Using The Correct Definition Of "Sum")

Classifying (sub-grouping) all of some list into groups, and then taking the cross of each `.combinations(2)` of those groups, plus the `.combinations(2)` within each group, will exactly equal a generic `.combinations(2)` of the initial list. Every two-element pairing will still occur exactly once. e.g:

``````\$ raku -e '.say for (1..30).combinations(2)' | wc -l
435     # 30*29/2, as expected

\$ raku -e '
my @groups = (1..30).classify(* % 10).sort.map(*.value); # 10 subgroups, by last-digit for fun
for @groups.combinations(2) -> (\$g1, \$g2) { # All 2-combinations of groups,
.say for \$g1.list X \$g2.list;           #   one group crossed with another group
}
for @groups -> \$g1 {                        # Each group,
.say for \$g1.combinations(2);           #   2-combinations of only itself
};
' | wc -l
435     # same count as the simple `.combinations(2)`, just sliced in a (potentially) more useful fashion.
``````

### Once grouped by highest set bit, we can deduce:

• Same-bits become zero, and all in a group have the same high bit.
• XOR of any X within a group with any Y within that same group always yield a cancellation of the high bit. The result will be in a lower group.
• XOR of any X within a group with any Y within a lower group can never cause a cancellation of the X high bit. The result will be in the same group as X.
• Therefore, the only time you have to cross (or `.combinations(2)`) within a group, is if that group is the only group.
• Also, since no combination of any X or Y from any non-top groups can combine with XOR to generate the high bit, no groups need to be crossed with each other using `X+^`; only the top group needs to be crossed with every other group.
• Whenever there is only one group, you still have a shortcut, because the max-XOR that they can generate is the same as the XOR of that group with their high-bit turned off; one or more rounds of recursion will get us to a multi-group solution, guaranteed by the initial `.unique`.

### Extended example - "same highest bits"

Consider `@ns = 876, 920, 930;`. All are in "group 9" (512..1023).

``````876 == 0b1101101100, after bit-9 removed: 0b101101100 == 364
920 == 0b1110011000, after bit-9 removed: 0b110011000 == 408
930 == 0b1110100010, after bit-9 removed: 0b110100010 == 418

876 +^ 920 == 364 +^ 408 == 244
876 +^ 930 == 364 +^ 418 == 206
920 +^ 930 == 408 +^ 418 ==  58
``````

So, `max_XOR(876,920,930)` will have the same result as `max_XOR(364,408,418)`, because all the XOR-ed combinations of the former equal all the XOR-ed combinations of the latter. So, we recurse with the reduced numbers. Again, they are all in the same group, this time "group 8".

``````364 == 0b101101100, after bit-8 removed: 0b01101100 == 108
408 == 0b110011000, after bit-8 removed: 0b10011000 == 152
418 == 0b110100010, after bit-8 removed: 0b10100010 == 162
``````

So, `max_XOR(364,408,418)` will have the same result as `max_XOR(108,152,162)`, so we recurse again.

`(108,152,162)` are not all in one group, so they have "different highest bits", which I needed a bigger example to show.

### Extended example - "different highest bits"

Consider `@ns = flat 1003, 1001, 1..511;`

When grouped by hi-bit, (1003,1001) are in "group 9", because their highest bit is `9`: (0b1111101011, 0b1111101001). In this example, I call this group `G9`. All the rest are in groups between `0` and `8`. In real code, they would be in separate groups, but for this example, I lump them all together, calling the super-group `G0_8`. How can we combine these?

``````G0_8 X+^ G0_8 always has bit 9 unset, because none of them have bit 9 set.
G9 X+^   G9 always has bit 9 unset, because the XOR cleared it.
G9 X+^ G0_8 always has bit 9   set, since the G9 element has bit 9, and nothing in G0_8 has bit 9 to counteract it.
``````

More generally, for any case of groups with "different highest bits", we only need to check the top group against all the lower groups.

In this particular case, the simple `.combinations(2)` would require `513*512/2 = 131328` XORs, while this grouping only needs `2*511 = 1022` XORs. `131328/1022 == 128.500978`. Even given the extra work of grouping, that will be quite a speed-up!

Worst case would be if the top group was full, and all lower groups were full. Even then, we cut the number of XORs in half.

### Not done:

I expect that a further speed-up is possible, by noting that for any individual `N` in the top group, if a group exists whose hi-bit matches the first "zero" bit in `N`, no group lower than that needs searching. E.g. for 943 (group 9) (0b1110101111), the zeros are in bits `6` and `4`. All elements of groups 8,7,6 must be XOR, but if group 6 had any elements, groups lower than 6 may be skipped. Otherwise, groups 5,4 must be XOR, but if group 4 had any elements, groups lower than 4 may be skipped. I did not have time to code this supposition into my solution above, nor did I verify my own logic. It might even result in a net loss of speed, given the work to find the positions of zeros in every `N` in the top group.

Bud: Well, let’s see, we have on the bags, Who’s on first, What’s on second, I Don’t Know is on third…
Lou: That’s what I want to find out.
-- Abbott & Costello Video

## 1 Comment

If modules are allowed, then the following gives pretty much optimal performance, slightly slower than O(N); heapifing an array is O(N), removing the max of the heap is O(Log2(N)). The heapify algorithm is pretty interesting, using the siftDown() algorithm the intitial pass is a surprising O(N)!

One thing I found confusing in your writeup, your perl solution takes a reference as the argument for the perl code, but then in your example you call it with a list. I chose to implement the list API.

``````use v5.34;
use Algorithm::Heapify::XS qw(max_heapify max_heap_shift);
max_heapify(@_);
my @ret;
while (@_ and @ret<3) {
my \$top = max_heap_shift(@_);
push @ret, \$top if !@ret or \$ret[-1] != \$top;
}
return \$ret[2] // \$ret[0];
}