Perl Weekly Challenge 35: Binary Encoded Morse Code

These are some answers to the Week 35 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days (November 24, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

This week, both tasks were contributed by Paul Johnson.

I usually first do task 1 in Perl 5 and in Raku (formerly known as Perl 6), or sometimes the other way around, and then task 2 in both languages. This week, however, the two tasks are so closely related that it makes sense to do both tasks together in one language, and then the two tasks in the other language.

The tasks

Task # 1: Encode Text into Binary Encoded Morse Code

Write a program to encode text into binary encoded morse code.

Pay attention to any changes which might need to be made to the text to make it valid morse code.

Morse code consists of dots, dashes and gaps. It can be encoded in binary in the following fashion: dot: 1 dash: 111 intra-character gap: 0 character gap: 000 word gap: 0000000

An intra-character gap is inserted between the dots and dashes in a character.

Task # 2: Decode from Binary Encoded Morse Code

Write a program to decode binary morse code.

Consider how it might be possible to recover from badly formed morse code.

1. by splitting the morse code on gaps

2. without looking further than one digit ahead

Comments applying to both tasks (and both Perl 5 and Raku)

The first thing we need is a table of Morse codes. This is a typical international Morse code table:

International_Morse_Code.PNG

I originally had thought of putting it in the __DATA__ section of the programs, but that would have meant duplicating it in each program. An attractive alternative that I would probably have chosen if I were doing the challenge in only one language would have been to create a module exporting the Morse encoding data structure (probably a hash). But since I'm doing the challenge in two different languages, that would have meant writing two modules. I preferred to store the Morse code in a separate morse.dat CSV file that could be used for both tasks and both languages. The file I originally prepared was this:

$ cat morse.dat
0:_____
1:.____
2:..___
3:...__
4:...._
5:.....
6:_....
7:__...
8:___..
9:____.
A:._
B:_...
C:_._.
D:_..
E:.
F:.._.
G:__.
H:....
I:..
J:.___
K:_._
L:._..
M:__
N:_.
O:___
P:.__.
Q:__._
R:._.
S:...
T:_
U:.._
V:..._
W:.__
X:_.._
Y:_.__
Z:__..

My next comment is that there is no distinction between upper and lower case letters in Morse code. So, since our morse.dat file is using upper case letters, we will fold all input data to uppercase. This has the obvious consequence that an encoding-decoding roundtrip will (at best) produce an uppercase result of the input data.

Then, task # 1 says to pay attention to any changes which might need to be made to the text to make it valid morse code. I'm not entirely sure of what this means, but I guess that means we have to make sure that our input data contains only uppercase alphabetic characters, digits, and spaces (quite obviously, Samuel Morse and Alfred Vail, the inventors of the Morse code, were not Unicode-aware, but that goes much further than that). So I decided to remove from the input any character not translatable into Morse, such as punctuation characters, etc., basically anything not belonging to the Perl5 [A-Z0-9 ] character class. As we will see, I later extended the Morse alphabet to some punctuation symbols.

Finally, task # 2 says we should try to recover from from badly formed Morse code. I'm sorry, but that's much too vague. I think that my solutions will be able to more or less recover from most small mistakes (at the expense of probably a typo in the output and/or a runtime warning), but there is just no way we will be able to make sense from random bytes. So, where do we draw the line? With our implementation, encoding errors will produce typos and warning, but the result should still be readable if the number of encoding errors is not too high.

Binary Encoded Morse Code in Perl 5

The first thing we need is a way to convert dash and dots into binary encoding. I'll use this hash:

my %bin_chars = ( '.' => 1, '_' => 111);

Note that I considered adding something like ' ' => '0000000' to also convert spaces, but that seemed to be a bad idea because that would have meant that a space between two words would have been turned into '0000000000000', and that does not appear to be the rationale of binary-encoded Morse. Because of that, I settled to split the input into words, to convert each individual word, and then to reassemble them with the string '0000000' between them.

Next, we need to read the morse.dat file and build a conversion table from the input alphabet A-Z0-9 (we will process spaces separately) into binary encoded Morse code.

my $morse_codes_file = "morse.dat";
open my $IN_MORSE, "<", $morse_codes_file or die "Cannot open $morse_codes_file $!";
my %bin_morse = 
    map { chomp; 
          my ($key, $val) = split ":", $_ ;
          my $bin_val = join '0', map $bin_chars{$_}, split //, $val;
          $key => $bin_val;
        } <$IN_MORSE>;

In the map code block, we're reading the morse.dat file line by line, split the lines into $key (the alphabet letter) and $val (the dot and dash Morse code), and then convert $val into $binval binary encoded Morse values by adding the "0" separator between the dots and dashes, and we store all this into a %bin_morse hash. After this, this hash contains an encoding table like this:

0 => 1110111011101110111
1 => 10111011101110111
2 => 101011101110111
3 => 1010101110111
4 => 10101010111
5 => 101010101
6 => 11101010101
7 => 1110111010101
8 => 111011101110101
9 => 11101110111011101
'A' => 10111
'B' => 111010101
'C' => 11101011101
'D' => 1110101
... Lines omitted for brevity ...
'X' => 11101010111
'Y' => 1110101110111
'Z' => 11101110101

So, we no longer care about dots and dashes, we now have a way to convert directly input letters into binary-encoded Morse values representing these letters.

Encoding and Decoding Binary Morse Code

We will use a to_morse subroutine to convert plain text into binary Morse, and a from_morse to do the conversion the other way around.

The to_morse subroutine will perform the following tasks: - Fold the input string to uppercase, since Morse is case-insensitive; - Remove from the input string any character not translatable into Morse (i.e. keep only alphanumerical characters plus spaces); - Split the input string into words (i.e. split on spaces); - Split each word into individual characters, translate each character into binary encoded Morse code, and reassemble the letters into words, with a '000' separator between the characters; - And finally join the words into a new string, with a '0000000' separator between the words.

The from_morse subroutine will need a lookup table to convert Morse encoded "letters" back into letters and digits of our alphabet. Given that the keys and values of the %bin_morse hash are unique, we can just create a new %rev_bin_morse hash by reverting %bin_morse. Then we need to: - Split the input string into words (splitting on '0000000'); - Split each word into "letters" (splitting on '000'), use the reverse look-up table to find the letter in our alphabet, and join the letters to reconstruct the original words; - And finally join the words into a string, with a space between the words.

The input string can be passed as a parameter to the string or, failing that, will use a default string (whose only merit is that, while being relatively short, it contains all letters of the alphabet, so that it constitutes a relatively good test case).

Although this is not strictly necessary, we also use a format80c subroutine which returns the encoded string with new lines inserted so that it can be nicely printed over 80 columns (but that subroutine is only for enabling pretty printing, it doesn't change the value of its argument).

Since we're using both to_morse and from_morse subroutines, we can test a round trip: first encode the input string, and then decode the result, and check visually that the final result corresponds to the input (subject to non-alphabetical characters which may have been removed and to upper-case folding).

This is the full Perl 5 script:

use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

my %bin_chars = ( '.' => 1, '_' => 111);
my $morse_codes_file = "morse.dat";
open my $IN_MORSE, "<", $morse_codes_file or die "Cannot open $morse_codes_file $!";
my %bin_morse = 
    map { chomp; 
          my ($key, $val) = split ":", $_ ;
          my $bin_val = join '0', map $bin_chars{$_}, split //, $val;
          $key => $bin_val;
        } <$IN_MORSE>;

my $input = shift // "The quick brown fox jumps over the lazy dog";

sub to_morse {
    my $input = uc shift;      # Morse doesn't have cases
    $input =~ s/[^A-Z0-9 ]//g; # remove non Morse characters
    my @morse_words;
    for my $word (split / /, $input) {
        push @morse_words, join '000', map { $bin_morse{$_} } split //, $word;
    }
    return join '0000000', @morse_words;
} 
sub from_morse {
    my $input = shift;
    my %rev_bin_morse = reverse %bin_morse;
    my @words;
    for my $word (split /0{7}/, $input) {
        push @words, join '', map $rev_bin_morse{$_}, split /000/, $word;
    }
    return join " ", @words;
}
sub format80c {
    shift =~ s/(.{80})/$1\n/gr;
}

say "Input string: $input";    
my $encoded = to_morse $input;
say "Binary encoded Morse string:\n", format80c $encoded;
say "Decoded string: ", from_morse $encoded;

When given no argument, the script uses the default string and displays the following:

Input string: The quick brown fox jumps over the lazy dog
Binary encoded Morse string:
11100010101010001000000011101110101110001010111000101000111010111010001110101110
00000011101010100010111010001110111011100010111011100011101000000010101110100011
10111011100011101010111000000010111011101110001010111000111011100010111011101000
10101000000011101110111000101010111000100010111010000000111000101010100010000000
10111010100010111000111011101010001110101110111000000011101010001110111011100011
1011101
Decoded string: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG

Extensions to the Morse Code

The table we've shown at the beginning is the universally acknowledged international Morse code. A number of extensions covering punctuations symbols or even accented letters have been suggested and are used in various parts of the world. The most common punctuation symbols are widely accepted. We will add six such symbols to our morse.dat file:

.:._._._
,:__..__
?:..__..
':.____.
!:_._.__
;:_._._.

Of course, if we add those six punctuation symbols to the conversion table, we also don't want to remove them from the input string, so that the code statement to remove non Morse letters will be changed to this:

$input =~ s/[^A-Z0-9 .,?'!;]//g; # remove non Morse characters

With these two changes, and with a string passed as an argument to the script, we get the following output:

$ perl morse.pl "Ask not what your country can do for you. Ask what you can do for your country."
Input string: Ask not what your country can do for you. Ask what you can do for your country.
Binary encoded Morse string:
10111000101010001110101110000000111010001110111011100011100000001011101110001010
10100010111000111000000011101011101110001110111011100010101110001011101000000011
10101110100011101110111000101011100011101000111000101110100011101011101110000000
11101011101000101110001110100000001110101000111011101110000000101011101000111011
10111000101110100000001110101110111000111011101110001010111000101110101110101110
00000010111000101010001110101110000000101110111000101010100010111000111000000011
10101110111000111011101110001010111000000011101011101000101110001110100000001110
10100011101110111000000010101110100011101110111000101110100000001110101110111000
11101110111000101011100010111010000000111010111010001110111011100010101110001110
10001110001011101000111010111011100010111010111010111
Decoded string: ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU. ASK WHAT YOU CAN DO FOR YOUR COUNTRY.

As you can see, the dot is now properly encoded and decoded.

Binary Encoded Morse Code in Raku (formerly known as Perl 6)

Our Raku implementation will follow the same logic as the Perl 5 version. The only changes reflect syntactic differences between the two languages. Therefore, I won't repeat the explanations supplied above, please refer to them if you need. We thus go straight to the Raku implementation:

use v6;

sub MAIN ($input = "The quick brown fox jumps over the lazy dog") {
    my %bin_chars = '.' => 1, '_' => 111;
    my %*bin-morse = "morse.dat".IO.lines.map({
        my ($key, $val) = split ":", $_ ;
        my $bin_val = $val.comb.map({%bin_chars{$_}}).join('0');
        $key => $bin_val;
    });
    say "Input string: $input";    
    my $encoded = to_morse $input;
    say "Binary encoded Morse string:";
    my $encoded-copy = $encoded;
    .Str.say for $encoded-copy ~~ s:g/(. ** 1..80)/$0\n/;
    say "Decoded string: ", from_morse $encoded;
}
sub to_morse ($input is copy) {
    $input ~~ s:i:g/<-[A..Z0..9\s.,?'!;]>//; # remove non Morse characters
    my @morse_words;
    for $input.uc.split(/\s/) -> $word {
        push @morse_words, join '000', map { %*bin-morse{$_} }, $word.comb;
    }
    return join '0000000', @morse_words;
} 
sub from_morse ($input) {
    my %rev_bin-morse = reverse %*bin-morse.kv;
    my @words;
    for split /0 ** 7/, $input -> $word {
         push @words, join '', map {%rev_bin-morse{$_}}, split /000/, $word;
    }
    return join " ", @words;
}

When given no argument, the script uses the default string and displays the following:

$ perl6  morse.p6
Input string: The quick brown fox jumps over the lazy dog
Binary encoded Morse string:
11100010101010001000000011101110101110001010111000101000111010111010001110101110
00000011101010100010111010001110111011100010111011100011101000000010101110100011
10111011100011101010111000000010111011101110001010111000111011100010111011101000
10101000000011101110111000101010111000100010111010000000111000101010100010000000
10111010100010111000111011101010001110101110111000000011101010001110111011100011
1011101
Decoded string: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG

And given a string passed as an argument to the script, we get the same output as in Perl 5:

$ ./perl6  morse.p6  'Ask not what your country can do for you. Ask what you can do for your country.'
Input string: Ask not what your country can do for you. Ask what you can do for your country.
Binary encoded Morse string:
10111000101010001110101110000000111010001110111011100011100000001011101110001010
10100010111000111000000011101011101110001110111011100010101110001011101000000011
(... Lines omitted for brevity ...)
11101110111000101011100010111010000000111010111010001110111011100010101110001110
10001110001011101000111010111011100010111010111010111
Decoded string: ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU. ASK WHAT YOU CAN DO FOR YOUR COUNTRY.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, December, 1. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.