## Vigenère vs Vigenère

The second task of the 15th Perl Weekly Challenge was to implement
an encoder and a decoder for the Vigenère Cipher. But that’s a
little more complicated than it seems, because the cipher that's named
after Blaise de Vigenère wasn’t actually invented by him, and the cipher
that Vigenère actually invented isn’t named after him.

So should we implement the Vigenère Cipher...or Vigenère’s cipher?
Why not both!

The Vigenère Cipher was devised by Giovan Battista Bellaso in 1553,
and then misattributed to Vigenère about three hundred years later.
It uses a tabula rēcta to translate from a message text to a ciphertext,
and back again.

Given an user-provided key (e.g. `"BELLASO"`), we encipher a message
(e.g. `"VIGENEREDIDNOTINVENTTHIS"`) by matching up respective letters of the
key and message, then using them as two indices to look up the corresponding
cipher character in the appropriate column and row of the tabula rēcta.
And if the key is shorter than the message, we just recycle the key
as many times as necessary.

For example:

` Key: B E L L A S O B E L L A S O B E L L A S O B E L L A...`
` Text: V I G E N E R E D I D N O T I N V E N T T H I S`
` Table: ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓`
` Cipher: W M R P N W F F H T O N G H J R G P N L H I M D`

In other words, each column of the tabula rēcta is a separate Caesar Cipher
(or a ROT-N transcription) with each successive letter of the key selecting
which substitution to perform on the corresponding letter in the message.

This approach was widely considered “indéchiffrable” back in the 1500s,
though Charles Babbage broke an example of it within two weeks of its
“rediscovery” in 1854, and Friedrich Kasiski published a reliable general
attack for it less than a decade later. Nevertheless, the Vigenère
Cipher
provides reasonable security, provided the chosen key is
long enough and unusual enough to prevent simple dictionary attacks.
Ironically, as we’ll see later, Vigenère’s actual cipher elegantly
solves both these problems with the Vigenère Cipher.

In order to implement the Vigenère Cipher, we first need to build a
tabula rēcta. In Perl 6, that’s just:

` constant @alphabet = ['A'...'Z'];`

` constant `@𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎
` = @alphabet, *.rotate ... *.head eq @alphabet.tail;`

The table is just a sequence of arrays, the first of which is the original
alphabet, with the rest being constructed by taking the previous array
and cyclically rotating it by one character (`*.rotate`). So `['A'...'Z']`
is followed by `['B'...'Z', 'A']`, which is followed in turn by
`['C'...'Z', 'A'...'B']`. Rinse and repeat, until we eventually
generate a rotation in which the first letter (`*.head`) is the last letter
of the original alphabet (`@alphabet.tail`). In other words,
we generate every rotation until we reach `['Z', 'A'...'Y']`.

Note that, we can change the alphabet we’re using simply by replacing the
contents of `@alphabet`. For example, if we had wanted to be able to encode a
message like `"Vigenère did NOT invent this!"` with a key such as
`"I❤️Bellaso"`, then we’d reconfigure our alphabet like so:

` # Valid message characters are all printable Latin-1 + lurve...`
` constant @alphabet = [' '...'ÿ', '❤️'];`

Perl 6 is a modern language with built-in support for the full Unicode
character set in both data and source code, so it’s perfectly okay to use
a string with emoji, like `"I❤️Bellaso"`, or to declare
a variable like @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎, whose name contains those weird
Unicode mathematical italic characters and ēvēn ā cōmbīnīng mārk.

Using italicized Unicode variable names might seem a crazy thing to do
but it also means that we're no longer constrained by Anglophone Linguistic
Imperialism. Instead of being forced to name a critical variable \$cipher-text,
we could name it naturally, in the language we typically speak:
\$texte-chiffré or \$النص-المشفر or \$加密文字
or \$Κρυπτογραφημένο-κείμενο or even \$ĉifrita-teksto.
Perl 6 aims to empower 100% of the world’s programmers.

We could use our @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎 array directly to look up the cipher character
for a given text/key character pair, but it’s a little ungainly to do that.
We’d need to convert each character back to its integer ASCII code
(via the built-in `.ord` method) and then normalize it down into the
0..25 range of table indices, like so:

` my `\$key-index` = `\$key-char`.ord - 'A'.ord;`
` my `\$text-index` = `\$text-char`.ord - 'A'.ord;`

` my `\$cipher-char` = `@𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎`[`\$key-index`][`\$text-index`];`

And that code would become significantly more complicated if we later
switched to a non-continuous alphabet like the one including `'❤️'`.

It would be much simpler if the 2D look-up table was a 2D dictionary
(in Perl 6 jargon: a hash), and could be directly indexed by the actual
characters from our alphabet, instead of by integers starting at zero.

And that’s easy to arrange:

` constant %encoder`
` = `@𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎`.map:`
` { `@^cipher`.head => hash @alphabet Z=> `@^cipher` }`

Here, for each cipher row (@^cipher) of the tabula rēcta, we build a hash entry
whose key is the first letter of the rotated alphabet (@^cipher`.head`),
and whose value is a nested hash that maps each letter of the alphabet to the
corresponding letter in the cipher row (`@alphabet Z=> `@^cipher).

The `Z=>` operator zips together corresponding elements of its two list
arguments, passing each two to the `=>` pair constructor, so the `map`
operation produces hash entries like:

` # Key Text->Cipher, etc.`

` 'M' => %('A'=>'M', 'B'=>'N', 'C'=>'O' ... 'Y'=>'K', 'Z'=>'L')`

So now when we want to look up the appropriate cipher character,
the top-level hash index selects the key column, and the second-level
hash index selects the plaintext character’s row. Like so:

` my `\$ciphertext-char` = %encoder{`\$key-char`}{`\$plaintext-char`};`

Better still, it’s just as easy to use this hash-based approach to
build a 2D decoding dictionary, simply by reversing the
nested mapping from alphabet characters to cipher characters:

` constant %decoder`
` = `@𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎`.map: `
` { `@^cipher`.head => hash `@^cipher` Z=> @alphabet`` }`

After which we can decode an enciphered character like so:

` my `\$plaintext-char` = %encoder{`\$key-char`}{`\$ciphertext-char`};`

Once we have those two 2D dictionaries, the Vigenère Cipher is easy
to implement:

` sub `Vigenère` (Str :\$text!, Str :\$key!, Bool :\$decode --> Str) {`

` # Extract lists of table indices from the two strings...`
` my @textchars = \$text.comb;`
` my @keychars = |\$key.comb xx ∞;`

` # Chose our trancoder...`
` my %recoder = \$decode ?? %decoder !! %encoder;`

` # Pair up indices, transcode, and concatenate the results...`
` (@keychars Z=> @textchars)`
` .map({ %recoder{.key}{.value} `// 0.chr` })`
` .join;`
` }`

The subroutine takes two named string arguments: a text (either
plaintext or ciphered) and the encryption/decryption key. The
exclamation mark after the two parameter names tells the Perl 6 compiler
that those two arguments are required. There is also an optional named
boolean argument (with no trailing exclamation mark) that indicates
whether we want to decode. The long arrow at the end of the signature
(`--> Str`) specifies that the subroutine returns a string.

Within the subroutine, we first split both the text string and the key into
lists of individual characters (`\$text.comb` and `\$key.comb`). We also
need to repeat the key-character sequence so that the key is at least as
long as the text string. We could work out precisely how many repetitions
were needed, but it’s much easier just to make the key repeat infinitely
often (`|\$key.comb xx ∞`). The vertical bar at the beginning
is needed to flatten the list of key characters. Otherwise, instead of
a list of repeated key-characters, we’d end up with a list of repeated
lists-of-key-characters.

We then select which of the two coding dictionaries we need,
depending on whether or not we’re decoding. In Perl 6, the ternary
selection operator is `test ?? `true-value` !! `false-value,
instead of `test ? `true-value` : `false-value.

Once we have everything set up, the actual encoding or decoding
is trivial:

` (@keychars Z=> @textchars)`
` .map({ %recoder{.key}{.value} `// 0.chr` })`
` .join;`

We first match up each character from the text with the next
successive character from the repeated key, once again using
the zip-into-a-list-of-pairs operator (`Z=>`) between
the two lists: `(@keychars Z=> @textchars)`

We then map that list of pairs to a list of encrypted/decrypted
characters, by looking up each key character (`.key`) and each
corresponding text character (`.value`) in the lookup table. If there
is no available conversion for the combination of characters, we just
use a null character (`0.chr`) as a placeholder. This ensures that any
characters outside the specified alphabet are cleanly mapped out
of the encrypted string, without compromising its future decryption.

Once all the characters have been converted, the list is simply
concatenated (`.join`) to produce the final string.

And that’s the Vigenère Cipher.

So what about Vigenère’s cipher?

The encryption technique discussed above has some serious weaknesses
as a cipher. In 1863, Friedrich Kasiski observed that repeating the
key string in order to generate a long enough key means that a single
encrypted message may sometimes reuse the same columns of the
tabula rēcta to encode the same sequence of plaintext
characters reappearing later in the original message.

So, if we intercept an encrypted string like:

` LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV`

...then it’s relatively easy to spot the two places where repeated
words in the plaintext message were apparently encrypted using
two separate copies of the same key characters:

` LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV`

` LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV`

The first apparent word repetition happens 18 characters apart, so if
the two words were encoded using the same key characters, the key must
repeat after 18 characters. Which means that the key length must either
be 18 or some integral factor of 18 (i.e. 9, 6, 3, 2, or 1). The second
repetition occurs 24 characters apart, so the key length must
simultaneously be some factor of 24 (i.e. 12, 8, 6, 4, 3, 2, or 1).
Since it’s the same key in both cases, the key length must therefore
be a common factor of both 18 and 24: either 6, 3, 2, or 1.

Given those possibilities, the key is most likely 6 characters long,
since the other lengths are too short, psychologically speaking.
In which case, we can probably decode the message simply by trying
a dictionary attack using all 17706 English six-letter words.

To do that we first need to know what the most likely key lengths are:

` sub large-factors (\$n) { set (4..\$n).grep(\$n %% *) }`

` my \$gap`
` = any keys [∩] map {large-factors(.<gap>.chars)},`
` \$encoded ~~ m:ex/\$<gap>=[ \$<seq>=[. ** 3..*] .+] \$<seq>/;`

Here we exhaustively match (`m:ex`) the encoded text against a regex that
locates any sequence of three or more consecutive characters
(`\$<seq>=[. ** 3..* ]`), then a gap (`.+`), then the previous character
sequence again (`\$<seq>`). We then determine the length of each gap
(`.<gap>.chars`), build a set the significant factors of each length
(`map {large-factors(...)}`), and take the intersection (`[∩]`) of those
sets to find the common factors. Finally, we convert the resulting set
of common factors to a single value, using an `any` junction.
Subsequently we'll be able to match word lengths against that junction
to select potential key words of the appropriate length(s).

Once we know how long the decryption key is, we can load up an
appropriate dictionary and filter it for suitable candidate keys:

` my `@candidate-keys
` = '/usr/share/dict/words'.IO.lines.grep(*.chars==\$gap)».uc;`

` my `@common-words
` = '/usr/share/dict/common'.IO.lines.grep(*.chars > 2)».uc;`

We keep only those words of the appropriate length(s): `.grep(*.chars==\$gap)`
We also load a second, smaller lexicon of common words, which we’ll use
to detect plausible decryptions.

Then we decode the ciphertext with every possible key of suitable length,
and print out any decryptions that seem to include a plausible number
of English words:

` # Request text to be deciphered...`
` my \$encoded = prompt "Ciphertext: ";`

` # Try each potential key...`
` for `@candidate-keys` -> \$key {`

` # Decrypt with that key...`
` my \$candidate = `Vigenère`( :text(\$encoded), :\$key, :decode);`

` # Count any real words found in the decryption...`
` my \$found = 0;`
` for `@common-words` -> \$word {`

` # Report any plausible decryptions found...`
` if \$candidate.contains(\$word) && ++\$found > 3 {`
` say "\$key --> \$candidate";`
` last;`
` }`
` }`
` }`

In just under 20 seconds that produces:

` Ciphertext: LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV`

` HYDROA --> ENDVYZPTDRUVTVGEHBDTSVYZPTDRULEGALUVTVGEH`
` HYDROA --> ENDVYZPTDRUVTVGEHBDTSVYZPTDRULEGALUVTVGEH`
` HOGGIE --> EXAGEVPDACARTFDPNXDDPGEVPDACAHEQXWARTFDPN`
` SECRET --> THEVIGENERECIPHERISNTVIGENERESTABLECIPHER`
` STRICH --> TSPEKSEYPAGOIASNTUSYEEKSEYPAGETLMUGOIASNT`
` WAGGIE --> PLAGEVARACARETDPNXORPGEVARACAHPEXWARETDPN`
` WEDGIE --> PHDGEVANDCAREPGPNXONSGEVANDCAHPAAWAREPGPN`

The fundamental problem here (as with so many cipher techniques)
is that the key was simply too short. We could insist that the user
provide a key that’s at least as long as the actual message:

` sub `Vigenère` (`
` Str :\$text!,`
` Str :\$key! where { \$key.chars >= \$text.chars },`
` Bool :\$decode`
` --> Str)`
` {...}`

However, in 1586 Blaise de Vigenère found a much easier solution:
take whatever key they offer and, instead than extending it with
extra copies of itself, extend the key with the “random” characters
of the message.

That is, instead of:

` Key: SECRETSECRETSECRETSECRETSECRETSECRETSECRET...`
` Text: THEVIGENERECIPHERISNTVIGENERESTABLECIPHER`
` Table: ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓`
` Cipher: LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV`

...we use:

` Key: SECRETTHEVIGENERECIPHERISNTVIGENERESTABLECIPHER`
` Text: THEVIGENERECIPHERISNTVIGENERESTABLECIPHER`
` Table: ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓`
` Cipher: LLGMMZXUIMMIMCLVVKACAZZOWAXMMYXNFCIUBPIPV`

This produces no significant repetitions in the ciphertext because
the repeated words in the plaintext are no longer being enciphered
with repetitions of a single short key. This approach is known as
an autokey cipher, because the full key is generated for us
automatically from the message itself.

Suppose we wanted to extend our previous Vigenère subroutine to offer
autokey encryption as well. We might just add another parameter to
the signature and then select the appropriate behaviour depending on
which arguments are actually provided:

` sub `Vigenère` (`
` Str :\$text!,`
` Str :\$key,`
` Str :\$autokey,`
` Bool :\$decode`
` --> Str`
` ) {`
` if \$autokey {...}`
` elsif \$key {...}`
` else { fail 'Must specify :key or :autokey' }`
` }`

But Perl 6 supports multiple dispatch, so there’s a much cleaner way
to add another mutually exclusive behaviour to the Vigenère subroutine.
We simple change the existing code from a `sub` to a `multi`:

` multi `Vigenère` (Str :\$text!, Str :\$key!, Bool :\$decode --> Str)`
` {`
` # Body exactly as before`
` }`

Then we add a second multiply dispatched variant, with a
different set of parameters and a different body:

` multi `Vigenère` (Str :\$text!, Str :\$autokey! --> Str) {`
` `Vigenère`( :\$text, :key(\$autokey ~ \$text) );`
` }`

This version requires an `:autokey` parameter, instead of a `:key` parameter.
It then immediately redispatches to the original (Bellaso) variant, passing the text
straight through unchanged (`:\$text`), but constructing a new (non-auto) key
by concatenating the supplied autokey with the text: `:key(\$autokey ~ \$text)`

The interpreter always knows which variant of Vigenère to call at any point,
because it examines the actual arguments passed into each call, and eliminates
from consideration any variant whose parameter list would not accept those
specific arguments. Which means, if we pass a `:key` argument, the interpreter
will always call the original (Bellaso) variant, whereas if we pass an `:autokey`
argument the interpreter will always call our new (Vigenère) variant.

Note that this new autokey variant of the Vigenère subroutine doesn’t allow
a `:decode` argument. That’s because, whereas encoding a text using
an autokey is just a minor variation on the Bellaso’s algorithm,
decoding it again is a little more complicated.

To decode text that has been autokey encrypted we obviously need
the original encryption key. But in an autokey system, most of that key
is provided by the original unencrypted text. So order to get back the
unencrypted text we must provide the decoder with...the unencrypted text.
Hmmmmm.

Except, of course, that the initial few characters of the autokey aren’t
part of the original plaintext; they’re just the characters of the
regular key (e.g. `"BELLASO"`, `"SECRET"`, etc.) that was specified.

Given this partial key, we can at least decipher that many initial
characters of the encrypted text. That gives us the first few characters
of the original message, so we can immediately use those characters
as the next part of the key, enabling us to decrypt a little more of the
encrypted text, thereby giving us a few more message characters,
which gives us some additional key characters, with which we can
extract several more plaintext characters, which...et cetera, et cetera,
until the entire text is decoded.

In Perl 6, that looks like this:

` multi `Vigenère` (Str :\$text!, Str :\$autokey!, :\$decode! --> Str)`
` {`
` # Break out text that was enciphered by the original key...`
` my (\$cipherkey, \$ciphertext)`
` = \$text.split: /<?at: \$autokey.chars>/;`

` # Decipher that key-enciphered initial text...`
` my @autokey`
` = `Vigenère`( :text(\$cipherkey), :key(\$autokey), :decode )`
` .comb;`

` # Progressively decipher the rest...`
` [~] flat gather for \$ciphertext.comb {`
` FIRST take @autokey;`
` @autokey.push:`
` take %decoder{@autokey.shift}{\$^cipherchar} `// 0.chr;
` }`
` }`

This third variant of the Vigenère multisub takes three required arguments:
the text, the original autokey, and the `:decode` flag. So this variant
will only be called when decoding an autokey encryption.

We first need to break the text into two strings: the initial N characters
that were encrypted using the original key (`\$cipherkey`), and the
remainder of the text which was encrypted using the unencrypted
message itself (`\$ciphertext`). To do that, we call the text string’s
`.split` method, passing it a regex. The text will be split wherever
the regex matches, so we use an `<?at: position>` assertion
within the regex, to force it to match only at the length of the original key:
`/<?at: \$autokey.chars>/`

Once we have the first N characters in `\$cipherkey`, we can simply
decrypt them with the original key (in `\$autokey`) by calling the original
(Bellaso) decoding variant, which we invoke by passing the original key
as a `:key` argument, instead of as `:autokey`. Like so:

` `Vigenère`( :text(\$cipherkey), :key(\$autokey), :decode )`

We then use the `.comb` method to convert that decrypted plaintext
into a list of characters, which we store in `@autokey`.

Now that we have decrypted those first N characters of plaintext,
we can start iterating through the list of characters from the remaining
encrypted text (`for \$ciphertext.comb {...}`), using the next
known autokey character (`@autokey.shift`) to decrypt the next
message character (`%decoder{@autokey.shift}{\$^cipherchar}`).
That newly decoded character is then saved for use as a subsequent
autokey character (`@autokey.push:`).

As we’re decrypting, we also need to capture every decoded character,
so we can eventually concatenate and return them as the decrypted text.
The easiest way to do that is with a `gather` block, which automagically
remembers anything we `take` while inside it.

In this case, we first have to gather all of the initial characters that were
decrypted with the original key and then stored in `@autokey`. So we tell
the `for` loop to `take` them, but only during its very first iteration:

` FIRST take @autokey;`

Then every time we decrypt another character within the loop, we have to
`take` it too, just before we push it onto the `@autokey` array:

` @autokey.push:`
` take %decoder{@autokey.shift}{\$^cipherchar} `// 0.chr;

Once the decrypting `for` loop terminates, the `gather` will return a list
of everything we have passed to `take`. We concatenate
those characters (`[~] flat`) into a single string...and we’re done.

So now, when we want the Vigenère Cipher, we write:

` \$ciphertext`
` = `Vigenère`( :text(\$plaintext), :key(`\$secret`) );`

` # and later...`

` \$plaintext`
` = `Vigenère`( :text(\$ciphertext), :key(`\$secret`), :decode );`

And when we want Vigenère’s actual cipher, we write:

` \$ciphertext`
` = `Vigenère`( :text(\$plaintext), :autokey(`\$secret`) );`

` # and later...`

` \$plaintext`
` = `Vigenère`( :text(\$ciphertext), :autokey(`\$secret`), :decode );`

If you accept modern theories of linguistic relativity, then the language
in which we think can shape the way in which we think. And perhaps
even limit the ways in which we can think.

Programming languages are the tools with which programmers think
about solutions. So your choice of programming language may shape
how you think of solutions, and may even constrain how it’s possible
for you think about them.

When you’re thinking about solutions in a world as linguistically
ambiguous and as computationally complex as Vigenère’s (or ours!),
you need a language that’s sufficiently linguistically sophisticated, and
computationally expressive, to enable you think clearly and effectively.

And that’s why I program in Perl 6.

Damian

## 1 Comment

Thank you very much, Damian, for sharing your views/solutions.