Perl Weekly Challenge # 16: Bitcoin Addresses

In this post dated July 8, I provided solutions to the first challenge of Week 16 of the Perl Weekly Challenge organized by Mohammad S. Anwar. At the time, I admitted that I was having trouble understanding the requirement for the second challenge. After quite a bit of googling, I can finally provide some answers to the second challenge of Week 16.

I should acknowledge that I spent far more time on this than I would wish to admit. Far more time than I ever did on any of the previous challenges.

Validation of Bitcoin Addresses

Write a script to validate a given bitcoin address. Most Bitcoin addresses are 34 characters. They consist of random digits and uppercase and lowercase letters, with the exception that the uppercase letter “O”, uppercase letter “I”, lowercase letter “l”, and the number “0” are never used to prevent visual ambiguity. A bitcoin address encodes 25 bytes. The last four bytes are a checksum check. They are the first four bytes of a double SHA-256 digest of the previous 21 bytes. For more information, please refer [wiki]'https://en.bitcoin.it/wiki/Address) page. Here are some valid bitcoin addresses:

1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2
3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy

I did not know anything about Bitcoins (and still don't), so I just tried to apply the rules described in the challenge. But it seemed that they were not complete, or maybe I missed something important. Anyway, I wasn't able at the time to even understand the requirement.

I tried to investigate the matter further.

Some parts of the code examples given below are partly copied or derived from other code samples found on the Internet. I readily admit that I did not design them entirely from scratch.

Base-58 encoding

The first thing to understand is that these bitcoin addresses are encoded in base-58. You need to decode them before you can split the bytes between the value itself and the checksum and apply a double SHA256 digest.

The characters used in this encoding are (in this specific order) :

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

i.e. alphanumerical characters except "O", "I", "l" and "0" (see the Wikipedia page on Base 58.

In Base 10 (using the 10 digits between 0 and 9), the first thing we need to do is to understand that a number such as 42 means:

4 * (10 ** 1) + 2 * (10 ** 0)

Similarly, in base 58, the "1BvB" string that starts our first valid address uses the following translation table:

1   =>  0
B   =>  10
v   =>  53
B   =>  10

Transforming such a base-58 string into a number goes like this:

0 * (58 ** 3) + 10 * (58 ** 2) + 53 * (58 ** 1) + 10 * (58 ** 0)

Note: the parentheses are not necessary (in Perl), but I suppose they make the thing clearer.

So, the first thing we need to do is to transform the input base-58 address into a binary object (an unsigned integer).

This will be a fairly large number (typically 58 or 59 digits). This is a number too large to be managed correctly in core Perl 5. So we will start with Perl 6 (which can handle arbitrary large integers), and see later how we can adapt the script in P5.

Caveat: I'm really new to bitcoins. The examples I provide below seem to work correctly, but, please, don't use the code examples below for any actual validation of bitcoin addresses. Use established libraries instead. There are probably edge cases where my code doesn't work properly. Also, a proper validation package should check the length of the input string, but I did not implement length validation because the information I found about that was somewhat contradictory.

Bitcoin Address Validation in Perl 6

This is my attempt at validating a bitcoin address in Perl 6:

use v6;
use Digest::SHA;
use Test;
plan 4;

my @base58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz".comb;
    # https://en.wikipedia.org/wiki/Base58

my %base58 = map { @base58[$_] => $_}, 0 .. 57;
sub b58toraw (Str $address) {
    my UInt $num = 0;
    my $i = 0;
    for $address.comb.reverse -> $letter {
        $num += %base58{$letter} * 58**$i++;
    }
    my @bytes = $num.base(16).fmt("%050s").comb(2) ;
    my $buff = Buf.new(map {.fmt("%02s").parse-base(16)}, @bytes);
    return $buff;
}

sub check($address) {
    return False if $address ~~ /<[O0lI]>/;
    my $raw = b58toraw $address;
    my $four-last-bytes = $raw.subbuf(21, 4);
    my $double-digest = sha256( sha256 $raw.subbuf(0, 21));
    return  True if $double-digest.subbuf(0, 4) eq $four-last-bytes;
    False; 
}
ok  check("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"), "Correct Address 1";
nok check("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVw2"), "Incorrect Address 1";
ok  check("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), "Correct Address 2";
nok check("3k98t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), "Incorrect Address 2";

which duly prints:

$ perl6 bitcoin.p6
1..4
ok 1 - Correct Address 1
ok 2 - Incorrect Address 1
ok 3 - Correct Address 2
ok 4 - Incorrect Address 2

In the event that you're interested, the value of the large $num integer for The first correct address is:

2936256236368367529205845895214681571805604556586060445291

and it is composed of the following bytes:

[00 77 BF F2 0C 60 E5 22 DF AA 33 50 C3 9B 03 0A 5D 00 4E 83 9A F4 15 76 6B]

Bitcoin Address Validation in Perl 5

In the course of my tests, I ended up with at least four versions of the subroutine to decode the base-58 input string.

One of them, which is loosely inspired from a CPAN module but does not work, used the Math::BigInt module to do the initial computing:

# CAUTION: does not work properly
sub decode_base58 {
    my $in = shift;
    my $decoded = Math::BigInt->new(0);
    my $multi   = Math::BigInt->new(1);
    my $base    = @base58;

    while (length $in > 0) {
        my $digit = chop $in;
        $decoded->badd($multi->copy->bmul($base58{$digit}));
        $multi->bmul($base);
    }
    return $decoded->to_base(256);
}

The code above simply does not compile because, for some reason, the version of Math::BigInt on my system does not appear to know the to_base (or, for that matter, the to_bytes) method:

"to_base" is not exported by the Math::BigInt module

I have no idea why these methods are unknown on my systems (I tried on both Linux and Windows), but I did not investigate much further and decided to write manually my own version of the calculation. To do this, we need to use an array for storing the decoded bytes. Whenever an item in the array becomes larger than a byte (more than 255), we normalize it to a byte (with the modulo operator) and carry over the excess to the next item.

#!/usr/bin/perl
use strict;
use warnings;
use Digest::SHA qw/sha256/;
use Test::More tests => 4;

my @base58 = split //, 
    "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    # https://en.wikipedia.org/wiki/Base58
my %base58 = map { $base58[$_] , $_} 0 .. 57;

sub base58_to_bytes {
    use integer;
    my $input = shift;
    my @out;
    for my $letter ( split //, $input ) {
        $_ *= 58 for @out; # new letter, multiply previous values by the base
        $out[0] += $base58{$letter};
        for my $index ( 0 .. $#out ) {
            my $val = $out[$index];
            if ($val > 255) {
                $out[$index] = $val % 256;  # normalize current byte 
                $out[$index + 1] += $val / 256; # carry over to next
            }
        }
    }
    $out[$_] //= 0 for 0 .. 24;  # padding empty slots
    return reverse @out;
}
sub check_address {
    my $address = shift;
    die "Forbidden character" if $address =~ /[O0lI]/;
    my @byte = base58_to_bytes $address;
    return 0 unless 
        (pack 'C*', @byte[21..24]) eq 
        substr sha256(sha256 pack 'C*', @byte[0..20]), 0, 4;
    return 1;
}

is check_address("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"), 1, "Correct Address 1";
is check_address("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVw2"), 0, "Incorrect Address 1";
is check_address("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), 1, "Correct Address 2";
is check_address("3J99t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), 0, "Incorrect Address 2";

The tests pass correctly:

$ perl  bitcoin2.pl
1..4
ok 1 - Correct Address 1
ok 2 - Incorrect Address 1
ok 3 - Correct Address 2
ok 4 - Incorrect Address 2

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, July 21. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.