November 2019 Archives

Perl Weekly Challenge 35: Binary Encoded Morse Code

By laurent_r on November 22, 2019 9:45 PM

These are some answers to the Week 35 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days (November 24, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

This week, both tasks were contributed by Paul Johnson.

I usually first do task 1 in Perl 5 and in Raku (formerly known as Perl 6), or sometimes the other way around, and then task 2 in both languages. This week, however, the two tasks are so closely related that it makes sense to do both tasks together in one language, and then the two tasks in the other language.

The tasks

Task # 1: Encode Text into Binary Encoded Morse Code

Write a program to encode text into binary encoded morse code.

Pay attention to any changes which might need to be made to the text to make it valid morse code.

Morse code consists of dots, dashes and gaps. It can be encoded in binary in the following fashion: dot: 1 dash: 111 intra-character gap: 0 character gap: 000 word gap: 0000000

An intra-character gap is inserted between the dots and dashes in a character.

Task # 2: Decode from Binary Encoded Morse Code

Write a program to decode binary morse code.

Consider how it might be possible to recover from badly formed morse code.

1. by splitting the morse code on gaps

2. without looking further than one digit ahead

Comments applying to both tasks (and both Perl 5 and Raku)

The first thing we need is a table of Morse codes. This is a typical international Morse code table:

I originally had thought of putting it in the __DATA__ section of the programs, but that would have meant duplicating it in each program. An attractive alternative that I would probably have chosen if I were doing the challenge in only one language would have been to create a module exporting the Morse encoding data structure (probably a hash). But since I'm doing the challenge in two different languages, that would have meant writing two modules. I preferred to store the Morse code in a separate morse.dat CSV file that could be used for both tasks and both languages. The file I originally prepared was this:

$ cat morse.dat
0:_____
1:.____
2:..___
3:...__
4:...._
5:.....
6:_....
7:__...
8:___..
9:____.
A:._
B:_...
C:_._.
D:_..
E:.
F:.._.
G:__.
H:....
I:..
J:.___
K:_._
L:._..
M:__
N:_.
O:___
P:.__.
Q:__._
R:._.
S:...
T:_
U:.._
V:..._
W:.__
X:_.._
Y:_.__
Z:__..

My next comment is that there is no distinction between upper and lower case letters in Morse code. So, since our morse.dat file is using upper case letters, we will fold all input data to uppercase. This has the obvious consequence that an encoding-decoding roundtrip will (at best) produce an uppercase result of the input data.

Then, task # 1 says to pay attention to any changes which might need to be made to the text to make it valid morse code. I'm not entirely sure of what this means, but I guess that means we have to make sure that our input data contains only uppercase alphabetic characters, digits, and spaces (quite obviously, Samuel Morse and Alfred Vail, the inventors of the Morse code, were not Unicode-aware, but that goes much further than that). So I decided to remove from the input any character not translatable into Morse, such as punctuation characters, etc., basically anything not belonging to the Perl5 [A-Z0-9 ] character class. As we will see, I later extended the Morse alphabet to some punctuation symbols.

Finally, task # 2 says we should try to recover from from badly formed Morse code. I'm sorry, but that's much too vague. I think that my solutions will be able to more or less recover from most small mistakes (at the expense of probably a typo in the output and/or a runtime warning), but there is just no way we will be able to make sense from random bytes. So, where do we draw the line? With our implementation, encoding errors will produce typos and warning, but the result should still be readable if the number of encoding errors is not too high.

Binary Encoded Morse Code in Perl 5

The first thing we need is a way to convert dash and dots into binary encoding. I'll use this hash:

my %bin_chars = ( '.' => 1, '_' => 111);

Note that I considered adding something like ' ' => '0000000' to also convert spaces, but that seemed to be a bad idea because that would have meant that a space between two words would have been turned into '0000000000000', and that does not appear to be the rationale of binary-encoded Morse. Because of that, I settled to split the input into words, to convert each individual word, and then to reassemble them with the string '0000000' between them.

Next, we need to read the morse.dat file and build a conversion table from the input alphabet A-Z0-9 (we will process spaces separately) into binary encoded Morse code.

my $morse_codes_file = "morse.dat";
open my $IN_MORSE, "<", $morse_codes_file or die "Cannot open $morse_codes_file $!";
my %bin_morse = 
    map { chomp; 
          my ($key, $val) = split ":", $_ ;
          my $bin_val = join '0', map $bin_chars{$_}, split //, $val;
          $key => $bin_val;
        } <$IN_MORSE>;

In the map code block, we're reading the morse.dat file line by line, split the lines into $key (the alphabet letter) and $val (the dot and dash Morse code), and then convert $val into $binval binary encoded Morse values by adding the "0" separator between the dots and dashes, and we store all this into a %bin_morse hash. After this, this hash contains an encoding table like this:

0 => 1110111011101110111
1 => 10111011101110111
2 => 101011101110111
3 => 1010101110111
4 => 10101010111
5 => 101010101
6 => 11101010101
7 => 1110111010101
8 => 111011101110101
9 => 11101110111011101
'A' => 10111
'B' => 111010101
'C' => 11101011101
'D' => 1110101
... Lines omitted for brevity ...
'X' => 11101010111
'Y' => 1110101110111
'Z' => 11101110101

So, we no longer care about dots and dashes, we now have a way to convert directly input letters into binary-encoded Morse values representing these letters.

Encoding and Decoding Binary Morse Code

We will use a to_morse subroutine to convert plain text into binary Morse, and a from_morse to do the conversion the other way around.

The to_morse subroutine will perform the following tasks: - Fold the input string to uppercase, since Morse is case-insensitive; - Remove from the input string any character not translatable into Morse (i.e. keep only alphanumerical characters plus spaces); - Split the input string into words (i.e. split on spaces); - Split each word into individual characters, translate each character into binary encoded Morse code, and reassemble the letters into words, with a '000' separator between the characters; - And finally join the words into a new string, with a '0000000' separator between the words.

The from_morse subroutine will need a lookup table to convert Morse encoded "letters" back into letters and digits of our alphabet. Given that the keys and values of the %bin_morse hash are unique, we can just create a new %rev_bin_morse hash by reverting %bin_morse. Then we need to: - Split the input string into words (splitting on '0000000'); - Split each word into "letters" (splitting on '000'), use the reverse look-up table to find the letter in our alphabet, and join the letters to reconstruct the original words; - And finally join the words into a string, with a space between the words.

The input string can be passed as a parameter to the string or, failing that, will use a default string (whose only merit is that, while being relatively short, it contains all letters of the alphabet, so that it constitutes a relatively good test case).

Although this is not strictly necessary, we also use a format80c subroutine which returns the encoded string with new lines inserted so that it can be nicely printed over 80 columns (but that subroutine is only for enabling pretty printing, it doesn't change the value of its argument).

Since we're using both to_morse and from_morse subroutines, we can test a round trip: first encode the input string, and then decode the result, and check visually that the final result corresponds to the input (subject to non-alphabetical characters which may have been removed and to upper-case folding).

This is the full Perl 5 script:

use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

my %bin_chars = ( '.' => 1, '_' => 111);
my $morse_codes_file = "morse.dat";
open my $IN_MORSE, "<", $morse_codes_file or die "Cannot open $morse_codes_file $!";
my %bin_morse = 
    map { chomp; 
          my ($key, $val) = split ":", $_ ;
          my $bin_val = join '0', map $bin_chars{$_}, split //, $val;
          $key => $bin_val;
        } <$IN_MORSE>;

my $input = shift // "The quick brown fox jumps over the lazy dog";

sub to_morse {
    my $input = uc shift;      # Morse doesn't have cases
    $input =~ s/[^A-Z0-9 ]//g; # remove non Morse characters
    my @morse_words;
    for my $word (split / /, $input) {
        push @morse_words, join '000', map { $bin_morse{$_} } split //, $word;
    }
    return join '0000000', @morse_words;
} 
sub from_morse {
    my $input = shift;
    my %rev_bin_morse = reverse %bin_morse;
    my @words;
    for my $word (split /0{7}/, $input) {
        push @words, join '', map $rev_bin_morse{$_}, split /000/, $word;
    }
    return join " ", @words;
}
sub format80c {
    shift =~ s/(.{80})/$1\n/gr;
}

say "Input string: $input";    
my $encoded = to_morse $input;
say "Binary encoded Morse string:\n", format80c $encoded;
say "Decoded string: ", from_morse $encoded;

When given no argument, the script uses the default string and displays the following:

Input string: The quick brown fox jumps over the lazy dog
Binary encoded Morse string:
11100010101010001000000011101110101110001010111000101000111010111010001110101110
00000011101010100010111010001110111011100010111011100011101000000010101110100011
10111011100011101010111000000010111011101110001010111000111011100010111011101000
10101000000011101110111000101010111000100010111010000000111000101010100010000000
10111010100010111000111011101010001110101110111000000011101010001110111011100011
1011101
Decoded string: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG

Extensions to the Morse Code

The table we've shown at the beginning is the universally acknowledged international Morse code. A number of extensions covering punctuations symbols or even accented letters have been suggested and are used in various parts of the world. The most common punctuation symbols are widely accepted. We will add six such symbols to our morse.dat file:

.:._._._
,:__..__
?:..__..
':.____.
!:_._.__
;:_._._.

Of course, if we add those six punctuation symbols to the conversion table, we also don't want to remove them from the input string, so that the code statement to remove non Morse letters will be changed to this:

$input =~ s/[^A-Z0-9 .,?'!;]//g; # remove non Morse characters

With these two changes, and with a string passed as an argument to the script, we get the following output:

$ perl morse.pl "Ask not what your country can do for you. Ask what you can do for your country."
Input string: Ask not what your country can do for you. Ask what you can do for your country.
Binary encoded Morse string:
10111000101010001110101110000000111010001110111011100011100000001011101110001010
10100010111000111000000011101011101110001110111011100010101110001011101000000011
10101110100011101110111000101011100011101000111000101110100011101011101110000000
11101011101000101110001110100000001110101000111011101110000000101011101000111011
10111000101110100000001110101110111000111011101110001010111000101110101110101110
00000010111000101010001110101110000000101110111000101010100010111000111000000011
10101110111000111011101110001010111000000011101011101000101110001110100000001110
10100011101110111000000010101110100011101110111000101110100000001110101110111000
11101110111000101011100010111010000000111010111010001110111011100010101110001110
10001110001011101000111010111011100010111010111010111
Decoded string: ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU. ASK WHAT YOU CAN DO FOR YOUR COUNTRY.

As you can see, the dot is now properly encoded and decoded.

Binary Encoded Morse Code in Raku (formerly known as Perl 6)

Our Raku implementation will follow the same logic as the Perl 5 version. The only changes reflect syntactic differences between the two languages. Therefore, I won't repeat the explanations supplied above, please refer to them if you need. We thus go straight to the Raku implementation:

use v6;

sub MAIN ($input = "The quick brown fox jumps over the lazy dog") {
    my %bin_chars = '.' => 1, '_' => 111;
    my %*bin-morse = "morse.dat".IO.lines.map({
        my ($key, $val) = split ":", $_ ;
        my $bin_val = $val.comb.map({%bin_chars{$_}}).join('0');
        $key => $bin_val;
    });
    say "Input string: $input";    
    my $encoded = to_morse $input;
    say "Binary encoded Morse string:";
    my $encoded-copy = $encoded;
    .Str.say for $encoded-copy ~~ s:g/(. ** 1..80)/$0\n/;
    say "Decoded string: ", from_morse $encoded;
}
sub to_morse ($input is copy) {
    $input ~~ s:i:g/<-[A..Z0..9\s.,?'!;]>//; # remove non Morse characters
    my @morse_words;
    for $input.uc.split(/\s/) -> $word {
        push @morse_words, join '000', map { %*bin-morse{$_} }, $word.comb;
    }
    return join '0000000', @morse_words;
} 
sub from_morse ($input) {
    my %rev_bin-morse = reverse %*bin-morse.kv;
    my @words;
    for split /0 ** 7/, $input -> $word {
         push @words, join '', map {%rev_bin-morse{$_}}, split /000/, $word;
    }
    return join " ", @words;
}

When given no argument, the script uses the default string and displays the following:

$ perl6  morse.p6
Input string: The quick brown fox jumps over the lazy dog
Binary encoded Morse string:
11100010101010001000000011101110101110001010111000101000111010111010001110101110
00000011101010100010111010001110111011100010111011100011101000000010101110100011
10111011100011101010111000000010111011101110001010111000111011100010111011101000
10101000000011101110111000101010111000100010111010000000111000101010100010000000
10111010100010111000111011101010001110101110111000000011101010001110111011100011
1011101
Decoded string: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG

And given a string passed as an argument to the script, we get the same output as in Perl 5:

$ ./perl6  morse.p6  'Ask not what your country can do for you. Ask what you can do for your country.'
Input string: Ask not what your country can do for you. Ask what you can do for your country.
Binary encoded Morse string:
10111000101010001110101110000000111010001110111011100011100000001011101110001010
10100010111000111000000011101011101110001110111011100010101110001011101000000011
(... Lines omitted for brevity ...)
11101110111000101011100010111010000000111010111010001110111011100010101110001110
10001110001011101000111010111011100010111010111010111
Decoded string: ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU. ASK WHAT YOU CAN DO FOR YOUR COUNTRY.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, December, 1. And, please, also spread the word about the Perl Weekly Challenge if you can.

0 comments

Perl Weekly Challenge 34: Array and Hash Slices and Dispatch Tables

By laurent_r on November 11, 2019 6:54 PM

These are some answers to the Week 34 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days (November 17, 2019). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

This week, both tasks were contributed by Dave Cross.

Task # 1: Array and Hash Slices

Write a program that demonstrates using hash slices and/or array slices.

Slices are a way to access several values of an array or of a hash in one statement, by using multiple subscripts or keys.

Array and Hash Slices in Perl 5

If you have an @array containing for example some successive integers, you can obtain several values from it with the following syntax: @array[3, 7, 2] or even @array[2..7].

If you try to do the same with a %hash and use %hash{'a', 'c'} you’ll get key/value pairs, which may or may not be what you want. If you want only the values, you need to change the sigil like so: @hash{'a', 'c'}. Array and hash slices may also be used as l-values, i.e. on the left-hand side of an assignment, to populate a new array or a new hash.

This short program illustrates all this:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use Data::Dumper;

my @array = (0..10);
my $count = 0;
my %hash  = map {$_ => ++$count} 'a'..'j';

say "Array slice :  @array[3..7]";
say "Hash slice 1: ", join ' ', %hash{'b', 'd', 'c'};
say "Hash slice 2: ", join ' ', %hash{'b'..'d'};
say "Hash slice 3: ", join ' ', @hash{'b'..'d'};
say "Hash slice 4: ", join ' ', @hash{qw/c b c d/};

# Array slice a l-value
my @new_array = (1, 2);
@new_array[2, 3] = @array[6, 7];
say "New array: ";
say Dumper \@new_array;    

# Hash slice as l-value:
my @keys = qw/c d e/;
my %new_hash = %hash{@keys}; # Perl 5.20 and above
say "New hash: ";
say Dumper \%new_hash;
my %new_hash2;
@new_hash2{@keys} = @hash{@keys};
say "New hash2: ";
say Dumper \%new_hash2;

This displays the following output:

$ perl hash_slices.pl
Array slice :  3 4 5 6 7
Hash slice 1: b 2 d 4 c 3
Hash slice 2: b 2 c 3 d 4
Hash slice 3: 2 3 4
Hash slice 4: 3 2 3 4
New array:
$VAR1 = [
          1,
          2,
          6,
          7
        ];

New hash:
$VAR1 = {
          'd' => 4,
          'c' => 3,
          'e' => 5
        };

New hash2:
$VAR1 = {
          'e' => 5,
          'c' => 3,
          'd' => 4
        };

Array and Hash Slices in Raku (formerly known as Perl 6)

Like in Perl 5, if you have an @array containing for example some successive integers, you can obtain several values from it with the following syntax: @array[3, 7, 2] or even @array[2..7].

And you can do the same with a hash to obtain a bunch of values. Array and hash slices may also be used as l-values, i.e. on the left-hand side of an assignment, to populate a new array or a new hash.

use v6;

my @array = 0..10;
my $count = 0;
my %hash  = map {$_ => ++$count}, 'a'..'j';

say "Array slice :  @array[3..7]";
say "Hash slice 1: ", join ' ', %hash{'b', 'd', 'c'};
say "Hash slice 2: ", join ' ', %hash{'b'..'d'};
say "Hash slice 3: ", join ' ', %hash<b c d>;

# Array slice a l-value
my @new-array = (1, 2);
@new-array[2, 3] = @array[6, 7];
say "New array: ", @new-array;
# Hash slice as l-value:
my @keys = qw/c d e/;
my %new-hash;
%new-hash{@keys} = %hash{@keys};
say "New hash: ", %new-hash;

This program produces the following output:

$ perl6 hash_slices.p6
Array slice :  3 4 5 6 7
Hash slice 1: 2 4 3
Hash slice 2: 2 3 4
Hash slice 3: 2 3 4
New array: [1 2 6 7]
New hash: {c => 3, d => 4, e => 5}

Task # 2: Dispatch Tables

Write a program that demonstrates a dispatch table.

A dispatch table is a table or more commonly hash of subroutine references.

For this task, we won’t simply demonstrate the syntax, but will try to do something (moderately) useful with it.

Suppose we have a text file and want to feed each word from the file into 26 files (one per letter of the alphabet) depending on the first letter of the word. This could be done with a monstrous if ... elsif ... else (or, in Raku, given ... when) construct, or we could use a dispatch table, in this case a hash containing for each letter a code reference printing the word into the proper file. As we will see, this produces much shorter and simpler code. We will even use a dynamic dispatch table, i.e. only create the hash entries (and files) that are needed with the input file.

Dispatch Tables in Perl 5

We first write a function_builder subroutine that acts as a function factory. It receives a letter as a parameter, creates a file name for that letter, opens the corresponding file in write mode, and it returns an anonymous subroutine (actually a closure) that writes its argument to the file handle. This anonymous subroutine will then be stored into the dispatch table.

In the main loop, the program reads the lines of the input file, fold them to lower case, splits the lines into words, and finds the first character of each such word. To avoid problems with special characters, we only keep words starting with a letter. If the dispatch table has no entry yet for this letter, the program calls function_builder subroutine to open the proper file and stores the code reference returned by that subroutine in the dispatch table. Finally, the program calls the code reference stored in the dispatch table for word’s first letter.

Note that Perl automatically closes files upon exiting.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;

my %dispatch;

sub function_builder {
    my $letter = shift;
    my $file_name = "${letter}_letter.txt";
    open my $FH, ">", $file_name or die "Could not open $file_name $!";
    return sub { say $FH shift }
}

while (<>) {
    chomp;
    for my $word (split / /, lc $_) {
        my $letter = substr $word, 0, 1;
        next if $letter !~ /^[a-z]/; 
        $dispatch{$letter} = function_builder($letter) unless defined $dispatch{$letter};
        $dispatch{$letter}->($word);
    }
}

Running the program and passing it the hash_slices.pl file (the script of task # 1 of this week) produced the following files in the default directory:

-rw-r--r--  1 Laurent Aucun      25 11 nov.  17:48  m_letter.txt
-rw-r--r--  1 Laurent Aucun      20 11 nov.  17:48  j_letter.txt
-rw-r--r--  1 Laurent Aucun      42 11 nov.  17:48  d_letter.txt
-rw-r--r--  1 Laurent Aucun       2 11 nov.  17:48  c_letter.txt
-rw-r--r--  1 Laurent Aucun      28 11 nov.  17:48  a_letter.txt
-rw-r--r--  1 Laurent Aucun      16 11 nov.  17:48  u_letter.txt
-rw-r--r--  1 Laurent Aucun       5 11 nov.  17:48  p_letter.txt
-rw-r--r--  1 Laurent Aucun       4 11 nov.  17:48  e_letter.txt
-rw-r--r--  1 Laurent Aucun      94 11 nov.  17:48  s_letter.txt
-rw-r--r--  1 Laurent Aucun      17 11 nov.  17:48  l_letter.txt
-rw-r--r--  1 Laurent Aucun      18 11 nov.  17:48  h_letter.txt
-rw-r--r--  1 Laurent Aucun      10 11 nov.  17:48  w_letter.txt
-rw-r--r--  1 Laurent Aucun       3 11 nov.  17:48  o_letter.txt
-rw-r--r--  1 Laurent Aucun      13 11 nov.  17:48  f_letter.txt
-rw-r--r--  1 Laurent Aucun       8 11 nov.  17:48  v_letter.txt
-rw-r--r--  1 Laurent Aucun       8 11 nov.  17:48  q_letter.txt
-rw-r--r--  1 Laurent Aucun       2 11 nov.  17:48  b_letter.txt

This is the file generated for letter “a”:

$ cat a_letter.txt
array
a
array:
as
available

Dispatch Tables in Raku

We do more or less the same thing as in P5: we first write a function_builder subroutine that acts as a function factory. It receives a letter as a parameter, creates a file name for that letter, opens the corresponding file in write mode, and it returns an anonymous code block (actually a closure) that writes its argument to the file handle. This anonymous code block will be stored into the dispatch table.

In the MAIN subroutine , the program reads the words of the input file, fold them to lower case, and finds the first character of each such word. To avoid problems with special characters, we only keep words starting with a letter. If the dispatch table has no entry yet for this letter, the program calls function_builder subroutine to open the proper file and stores the code reference returned by that subroutine in the dispatch table. Finally, the program calls the code reference stored in the dispatch table for word’s first letter.

use v6;

sub function_builder (Str $letter) {
    my $file_name = "letter_$letter.txt";
    my $fh = open "./$file_name", :w;
    return { $fh.say($^a) }
}

multi sub MAIN (Str $file where *.IO.f) {
    my %dispatch;
    for $file.IO.words.map({.lc}) -> $word {
        my $letter = substr $word, 0, 1;
        next if $letter !~~ /^<[a..z]>/; 
        %dispatch{$letter} = function_builder $letter unless defined %dispatch{$letter};
        %dispatch{$letter}($word);
    }
}

Running the program and passing it the hash_slices.p6 file (the script of task # 1 of this week) produced the following files in the default directory:

-rwxr-xr-x  1 Laurent Aucun       5 11 nov.  18:26  letter_u.txt
-rwxr-xr-x  1 Laurent Aucun       5 11 nov.  18:26  letter_v.txt
-rwxr-xr-x  1 Laurent Aucun       3 11 nov.  18:26  letter_c.txt
-rwxr-xr-x  1 Laurent Aucun       6 11 nov.  18:26  letter_q.txt
-rwxr-xr-x  1 Laurent Aucun       5 11 nov.  18:26  letter_e.txt
-rwxr-xr-x  1 Laurent Aucun      72 11 nov.  18:26  letter_s.txt
-rwxr-xr-x  1 Laurent Aucun      29 11 nov.  18:26  letter_m.txt
-rwxr-xr-x  1 Laurent Aucun      18 11 nov.  18:26  letter_j.txt
-rwxr-xr-x  1 Laurent Aucun      19 11 nov.  18:26  letter_l.txt
-rwxr-xr-x  1 Laurent Aucun      13 11 nov.  18:26  letter_h.txt
-rwxr-xr-x  1 Laurent Aucun       8 11 nov.  18:26  letter_d.txt
-rwxr-xr-x  1 Laurent Aucun      22 11 nov.  18:26  letter_a.txt

This is the file generated for letter “a”:

$ cat letter_a.txt
array
a
array:
as

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, November, 24. And, please, also spread the word about the Perl Weekly Challenge if you can.

0 comments

Perl Weekly Challenge 33: Count letters and Multiplication Tables

By laurent_r on November 6, 2019 10:12 PM

These are some answers to the Week 33 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days from now (November 10,, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Count Letters (A..Z)

Create a script that accepts one or more files specified on the command-line and count the number of times letters appeared in the files.

So with the following input file sample.txt:

The quick brown fox jumps over the lazy dog.

the script would display something like:

a: 1
b: 1
c: 1
d: 1
e: 3
f: 1
g: 1
h: 2
i: 1
j: 1
k: 1
l: 1
m: 1
n: 1
o: 4
p: 1
q: 1
r: 2
s: 1
t: 2
u: 2
v: 1
w: 1
x: 1
y: 1
z: 1

This is not specified explicitly, but from the example, we gather that what is desired here is a case-insensitive letter count (in the example, both "T" and "t" count as "t"). So we will apply the lc (lower case) built-in function to the input.

Letter Histogram in Perl 5

We will start with a Perl 5 one-liner with one intersection.pl file as input:

$ perl -nE 'for my $l (split //, lc) { $h{$l}++}; END{say "$_: ", $h{$_}//0 for ("a".."z");}' intersection.pl
a: 96
b: 46
c: 25
d: 22
e: 72
f: 19
g: 20
h: 4
i: 77
j: 0
k: 0
l: 21
m: 16
n: 59
o: 32
p: 12
q: 1
r: 52
s: 77
t: 49
u: 9
v: 19
w: 3
x: 15
y: 31
z: 0

The -n command line option will loop over the lines of the input file. The program uses a %h hash to store the letter count. Note that, at the end, the hash will contain counts for characters other than the 'a'..'z' range, but we don't really care, since we will print out only the letters of that range. Note that some letters (j, k, and z) weren't seen in the process, but the $h{$_}//0 syntax ensures that the value 0 will be printed for letters where the hash isn't defined. Also note that the (split //, lc) syntax makes it possible to fold each line to lowercase (lc defaults on $_ when there are no arguments), and split works on the lower-cased line, so that $l will take in turn each character of the input line, folded to lowercase when needed.

This is now the output for the same one-liner with several input files (the same intersection.pl file plus several histo* files):

$ perl -nE 'for my $l (split //, lc) { $h{$l}++}; END{say "$_: ", $h{$_}//0 for ("a".."z");}' intersection.pl histo*
a: 199
b: 154
c: 123
d: 111
e: 271
f: 99
g: 37
h: 49
i: 170
(... some output lines omitted for brevity ...)
u: 42
v: 26
w: 22
x: 20
y: 68
z: 9

If you prefer a real full-fledged Perl script, this may look like this:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my %histo;

while (<>) {
    chomp;
    my $line = lc;
    for my $letter (split //, $line) {
        $histo{$letter}++ if $letter =~ /[a-z]/;
    }
}
for my $key ("a".."z") {
    say "$key: ", $histo{$key} // 0;
}

This script outputs the same result as the one-liner above when given the same input (be it only one or several files).

Note that, here, the program filters the input characters and stores only letters in the %histo hash. This is just to make this filtering more explicit (and self-documenting), but this is not necessary for the program to work the same way.

Letter Histogram in Raku (formerly known as Perl 6)

Perl 6 has been renamed "Raku" in October 2019, so we will use that name from now on.

In Raku, we can use a Bag, named $histo, rather than a hash to easily implement an histogram. With just a little bit of work, we're able to populate the bag in just one statement, without any explicit loop. Also, if a letter does not exist in the $histo bag, the bag will report 0, so that we don't need any special code to avoid an undefined warning for such an edge case. All this makes the code much more concise than the Perl 5 counterpart.

use v6;

sub MAIN (*@files) {
    my $histo = (map {.IO.comb».lc}, @files).Bag;
    say "$_ : ", $histo{$_} for 'a'..'z';
}

Used with one input file, the program displays the following:

$ perl6 histo_let.p6 intersection.pl
a : 96
b : 46
c : 25
d : 22
e : 72
f : 19
g : 20
h : 4
i : 77
j : 0
k : 0
[... Lines omitted for brevity ...]
y : 31
z : 0

And it works similarly with several input files:

$ ./perl6 histo_let.p6 intersection.pl histo*
a : 199
b : 154
c : 123
d : 111
e : 271
f : 99
g : 37
h : 49
i : 170
j : 4
k : 11
[... Lines omitted for brevity ...]
y : 68
z : 9

Challenge # 2: Formatted Multiplication Table

Write a script to print 11x11 multiplication table, only the top half triangle.

x|   1   2   3   4   5   6   7   8   9  10  11
---+--------------------------------------------
1|   1   2   3   4   5   6   7   8   9  10  11
2|       4   6   8  10  12  14  16  18  20  22
3|           9  12  15  18  21  24  27  30  33
4|              16  20  24  28  32  36  40  44
5|                  25  30  35  40  45  50  55
6|                      36  42  48  54  60  66
7|                          49  56  63  70  77
8|                              64  72  80  88
9|                                  81  90  99
10|                                     100 110
11|                                         121

Formatted Multiplication Table in Perl 5

It might make sense to write separate subroutines for headers and for lines of the multiplication table, but this is so simple that I tend to consider this overkill. I will rather print the header and then print the table lines. Perl has a format feature that I have used so rarely that I hardly remember how to use it. Besides, as far as I can say, it would not be very practical for such output. Even though I haven't written any large amount of C code for more than 15 years, I still like quite a lot the possibilities offered by the standard C printf function. For a nice formatted output, the simplest will be to use when needed the built-in Perl printf subroutine, which is almost the same as its C counterpart.

#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';

sub print_table {
    my $max = shift;
    # Print header
    printf "%2s |", "x";
    printf "%4d", $_ for 1..$max;
    say "\n---|", "-" x (4 * $max);
    # Print table lines
    for my $i (1..$max) {
        printf "%2d |%s", $i, ' ' x (4 * ($i - 1));
        for my $j ($i..$max) {
            printf "%4d", $i * $j;
        }
        say "";
    }
}
print_table shift//11;

Running this program with the default 11 value produces the following output:

$ perl mult-table.pl
 x |   1   2   3   4   5   6   7   8   9  10  11
---|--------------------------------------------
 1 |   1   2   3   4   5   6   7   8   9  10  11
 2 |       4   6   8  10  12  14  16  18  20  22
 3 |           9  12  15  18  21  24  27  30  33
 4 |              16  20  24  28  32  36  40  44
 5 |                  25  30  35  40  45  50  55
 6 |                      36  42  48  54  60  66
 7 |                          49  56  63  70  77
 8 |                              64  72  80  88
 9 |                                  81  90  99
10 |                                     100 110
11 |                                         121

You can also pass a larger value:

$ perl mult-table.pl 15
 x |   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
---|------------------------------------------------------------
 1 |   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
 2 |       4   6   8  10  12  14  16  18  20  22  24  26  28  30
 3 |           9  12  15  18  21  24  27  30  33  36  39  42  45
 4 |              16  20  24  28  32  36  40  44  48  52  56  60
 5 |                  25  30  35  40  45  50  55  60  65  70  75
 6 |                      36  42  48  54  60  66  72  78  84  90
 7 |                          49  56  63  70  77  84  91  98 105
 8 |                              64  72  80  88  96 104 112 120
 9 |                                  81  90  99 108 117 126 135
10 |                                     100 110 120 130 140 150
11 |                                         121 132 143 154 165
12 |                                             144 156 168 180
13 |                                                 169 182 195
14 |                                                     196 210
15 |                                                         225

Note that this is not exactly the output shown in the task description, but this is deliberately so, I think this looks slightly better.

Formatted Multiplication Table in Raku (Perl 6)

Just as in Perl 5, the simplest is to use the printf subroutine when needed (I actually wrote the Raku version before the P5 one, but that's not important). The Raku program is very similar to the Perl 5 program:

use v6;
sub MAIN (UInt $max = 11) {
    print-table($max);
}
sub print-table ($max) {
    # Print header
    printf "%2s |", "x";
    printf "%4d", $_ for 1..$max;
    say "\n---|", "-" x 4 * ($max);
    # Print table lines
    for 1..$max -> $i {
        printf "%2d |%s", $i, ' ' x 4 * ($i - 1);
        for $i..$max -> $j {
            printf "%4d", $i * $j;
        }
        say "";
    }
}

This script prints out the following:

$ perl6 mult-table.p6
 x |   1   2   3   4   5   6   7   8   9  10  11
---|--------------------------------------------
 1 |   1   2   3   4   5   6   7   8   9  10  11
 2 |       4   6   8  10  12  14  16  18  20  22
 3 |           9  12  15  18  21  24  27  30  33
 4 |              16  20  24  28  32  36  40  44
 5 |                  25  30  35  40  45  50  55
 6 |                      36  42  48  54  60  66
 7 |                          49  56  63  70  77
 8 |                              64  72  80  88
 9 |                                  81  90  99
10 |                                     100 110
11 |                                         121

As for the P5 implementation, this is not exactly the output shown in the task description, but, as said before, I think this looks slightly better.

Just in case you want to know, this works equally well when passing a parameter other than 11. For example with an argument of 15, this prints the same as the P5 version.

Of course, the nice formatting starts to break when passing a parameter higher than 31, but the initial requirement was just an 11*11 multiplication table. It would not be difficult to change the script to make it work with larger values (we could even dynamically adapt the formatting strings to the maximal output number), but nobody needs commonly a larger multiplication table. Besides, it would print very badly with the width limit imposed by this site.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, November, 17. And, please, also spread the word about the Perl Weekly Challenge if you can.

0 comments

« October 2019 | Main Index | Archives | December 2019 »

About laurent_r

I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.

More info »

laurent_r

November 2019 Archives

Perl Weekly Challenge 35: Binary Encoded Morse Code

The tasks

Task # 1: Encode Text into Binary Encoded Morse Code

Task # 2: Decode from Binary Encoded Morse Code

Comments applying to both tasks (and both Perl 5 and Raku)

Binary Encoded Morse Code in Perl 5

Encoding and Decoding Binary Morse Code

Extensions to the Morse Code

Binary Encoded Morse Code in Raku (formerly known as Perl 6)

Wrapping up

Perl Weekly Challenge 34: Array and Hash Slices and Dispatch Tables

Task # 1: Array and Hash Slices

Array and Hash Slices in Perl 5

Array and Hash Slices in Raku (formerly known as Perl 6)

Task # 2: Dispatch Tables

Dispatch Tables in Perl 5

Dispatch Tables in Raku

Wrapping up

Perl Weekly Challenge 33: Count letters and Multiplication Tables

Challenge # 1: Count Letters (A..Z)

Letter Histogram in Perl 5

Letter Histogram in Raku (formerly known as Perl 6)

Challenge # 2: Formatted Multiplication Table

Formatted Multiplication Table in Perl 5

Formatted Multiplication Table in Raku (Perl 6)

Wrapping up

About laurent_r

Search this blog