A small puzzle for you

This had me stumped for a bit, but I was quite pleased when I came up with a relatively simple solution.

Given several arrays, each of which has elements which are a subset of allowed elements, and given that every allowed element appears at least once in each array, how do I rewrite all arrays such that each element of each array which has an identical value to an element in another array has the same index number in each array, with missing elements being undef?

OK, that was a mouthful. Here's an example which should make it clear:

@a = ( 'b', 'c', 'f' );
@b = ( 'a', 'd' );
@c = ( 'c', 'd', 'e' );

I should have the following when I'm done:

@a = (  undef,    'b',    'c',  undef,  undef,    'f' );
@b = (    'a',  undef,  undef,    'd',  undef,  undef );
@c = (  undef,  undef,    'c',    'd',    'e',  undef );

In other words, I'm trying to line up all of those values (because they're going to an HTML table and my $client needed to see the missing values). Once you see the answer to the puzzle, it's actually not too hard.

Post your solutions below!

17 Comments

#!/usr/bin/env perl
use strict;
use warnings;
use List::AllUtils 'uniq';
use Data::Printer;

sub align {
    my @all_values = sort(uniq(map {@$_} @_));
    for my $param (@_) {
        my %hash;@hash{@$param}=@$param;
        @$param = @hash{@all_values};
    }
    return;
}

my @a = ( 'b', 'c', 'f' );
my @b = ( 'a', 'd' );
my @c = ( 'c', 'd', 'e' );

align(\@a,\@b,\@c);

p @a;
p @b;
p @c;

There's probably a nicer solution in case you don't want to stringify the elements, but this one seems simple and effective enough.

Mine is fairly similar to Moritz's: https://gist.github.com/jleader/fefa388194aad71193f9

Hi Ovid, Not sure if you are looking for an in place insertion of those 'undefs'.. but in case you are expending extra hash, then this would look much simple.

http://pastebin.com/XPwpwwTy

Something like this looks more logical and straightforward to me:

$VAR1 = { 'e' => [ undef, undef, undef, undef, undef, undef, undef, 1 ], 'c' => [ undef, 1, undef, undef, undef, 1, undef, undef ],

Don't forget to package the solution to a module, for the rest of us :)

In your example the input lists are sorted. Is that part of the problem spec? If so, you can do it by shifting off the input lists one position at a time:

use List::AllUtils qw(max minstr any);
my @lists = (\@a, \@b, \@c);

# First make them all the same length.
my $len = max map { scalar @$_ } @lists;
foreach (@lists) {
    push @$_, undef while scalar @$_ < $len;
}

my @results = map { [] } @lists;
my $last;
while (any { @$_ } @lists) {
    my @first_elems = map { $_->[0] } @lists;
    my @d = grep { defined } @first_elems;
    if (not @d) {
        shift @$_ foreach @lists;
        next;
    }
    my $min = minstr @d;

    if (defined $last and $min eq $last) {
        # Still filling in the same value - don't lengthen the output lists
        # but populate holes in the last element.
        #
        foreach my $i (0 .. $#lists) {
            for ($lists[$i]->[0]) {
                if (defined and $_ eq $min) {
                    die if defined $results[$i][-1];
                    $results[$i][-1] = $min;
                    $_ = undef;
                }
            }
        }
    }
    else {
        foreach my $i (0 .. $#lists) {
            for ($lists[$i]->[0]) {
                if (defined and $_ eq $min) {
                    push @{$results[$i]}, $_;
                    $_ = undef;
                }
                else {
                    push @{$results[$i]}, undef;
                }
            }
        }
    }
    $last = $min;
}

This is a lot more verbose than the solutions with uniq or hashes, but I believe it takes linear time in the length of the inputs, while they take order (n log n).

my ( $i, %index ) = 0;
$index{ $_ } //= $i++ for @a, @b, @c;
for ( \( @a, @b, @c ) ) {
    my @rearranged;
    @rearranged[ @index{ @$_ } ] = @$_;
    $#rearranged = keys %index;
    @$_ = @$rearranged;
}

Untested. Downside is the double copy.

I believe it takes linear time in the length of the inputs, while they take order (n log n).

That would be nice if this were C, but it’s Perl. So the algorithm may be linear, but due to huge amounts of op dispatch, sub call overhead etc, its implementation in Perl likely has a humongous constant factor, which a big-O analysis hides. In order to actually see it outrun the algorithms that are worse in big-O terms, you’ll probably have to feed it inputs on the order of six digits, maybe even seven or more.

Optimising Perl code usually means finding a builtin to push the bulk of the work into, even if that means an overall algorithm that scales worse to very large inputs.

Aristotle, you are right of course that in practice code based on builtins is likely to be faster. The input size would have to be quite huge before the list-popping implementation starts to outperform one based on hashes or sorting.

My answer is buggy in any case: try

@lists = (["a", "c", "f", "j"], ["i", "j"], []);

It puts "j" in two different columns.

use warnings;
use List::MoreUtils qw(uniq);

my @in = (
    [ 'b', 'c', 'f' ],
    [ 'a', 'd'      ],
    [ 'c', 'd', 'e' ],
);
my $i =0;
my %indexes = map {$_ => $i++} sort uniq map @$_, @in;


foreach $r (@in) {
  my @new;
  @new[$i-1,@indexes{@$r}] = (undef,@$r);
  $r = \@new
}

use Data::Dumper;
print Dumper \@in;

Not sure if it had to be in place or sorted, but I believe this works.

With apologies to Moritz, whose framework I stole. I would have used Aristotle's uniq replacement, but $work perl is only 5.10.1

Mine was pretty bad compared to others...

use strict;
use warnings;
use v5.10;

use Test::More tests => 3;

use List::MoreUtils qw(uniq);

my @a = ( 'b', 'c', 'f' );
my @b = ( 'a', 'd' );
my @c = ( 'c', 'd', 'e' );

my @set = uniq sort @a, @b, @c;
@a = sparse([@a], [@set]);
@b = sparse([@b], [@set]);
@c = sparse([@c], [@set]);

is_deeply(\@a, [  undef,    'b',    'c',  undef,  undef,    'f' ]);
is_deeply(\@b, [    'a',  undef,  undef,    'd',  undef,  undef ]);
is_deeply(\@c, [  undef,  undef,    'c',    'd',    'e',  undef ]);

sub sparse {
    my ($in, $set) = @_;
    if (! defined $in->[0] && ! defined $set->[0]) {
        return;
    }
    if ($in->[0] && $in->[0] eq $set->[0]) {
        shift @{$set};
        return shift @{$in}, sparse($in, $set);
    }
    else {
        shift @{$set};
        return undef, sparse($in, $set)
    }

}

I assume that the goal solution will not keep initial order and the expectation is to remove duplication in the process. If so here's another way to solve this one:

https://gist.github.com/notbenh/866b95cbea06a5d50b4f

use Data::Dumper;

my @a = ( 'b', 'c', 'f' );
my @out=();

for (@a) { $out[(ord $_) - 97] = $_;}
@a=@out;

print Dumper \@a;

Using the example in full:-

#!/usr/bin/perl
use warnings;
use strict;

my @a = ( 'b', 'c', 'f' );
my @b = ( 'a', 'd' );
my @c = ( 'c', 'd', 'e' );

align(\@a,\@b,\@c);

sub align {
    my @aligns=@_;
    for (@aligns) {
        my $arr=$_;
        my @out=();
        for (@$_) { $out[(ord $_) - 97] = $_ }
        @$arr = @out;
    }
}

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/