A little thing to love about Perl 6 (and COBOL)
By now you've heard the announcement that the Perl 6 team is cautiously hopeful that Perl 6.0.0 will be released this year. There are three things they need to finish:
- The Great List Refactor (which should improve performance)
- Native Shaped Arrays (tell Perl 6 that you only have 10 elements and you'll get an exception if you add more)
- Normalized Form Graphemes (solves some issues with combining characters in Unicode, such as the little-known "tapeworm operator"
With those, Perl 6 will truly be born. Of course, it will need better documentation, tooling support, better modules, and so on.
From what I can see, Perl 6 actually has a chance to take off and there's one delightful little feature that I want to talk about.
But long before I get to Perl 6, I want to explain why I like COBOL. Er, no, not really, but COBOL has something you've probably never heard of: packed decimals. This is a storage format for numbers that mainframes often handle natively. When mainframes first came to businesses, they were heavily used in accounting (well, they still are) and numbers were base 10, not base 2 (internally they're base 2, but we usually display them in base 10). We don't used packed decimals. We use floating point numbers and that's a problem, but I'll get to that in a bit.
In COBOL, you might declare a signed number named
TOTAL as having 4 digits before the decimal point and 2 digits after like this:
01 TOTAL PIC S9(4)V9(2) COMP-3.
Skipping the detail about what that means, the numbers are stored as packed decimals. Each digit is one nybble (4 bits) and two digits are one byte. In the above, the two digits after the decimal point were held in a single byte and if you needed to add a list of packed decimal numbers, the mainframe would internally use an "Add Packed" instruction. This was not floating point math. And it was base 10, not base 2. This means that in COBOL, if you compute
.1 + .2 - .3, the answer you get back is
What is that in Perl 5?
$ perl -E 'say .1 + .2 - .3' 5.55111512312578e-17 # 0.0000000000000000555111512312578
That's a very small number, but it's not zero. Maybe you think that's OK, but it's not. Zero is special and you can't screw it up. Here's "one" divided by "zero":
$ perl -E 'say 1/(.1 + .2 - .3)' 1.8014398509482e+16 # 18,014,398,509,482,000 # or roughly 18 quadrillian
18 quadrillian is an exceptional number, but it's not an exception. If you multiply the mass of the sun by this "zero" you have 110 trillion kilograms left over, or roughly the mass of Mount Everest.
So why isn't
.1 + .2 - .3 zero? Because floating points.
In floating point math, numbers are base 2, not base 10. The mantissa (the number after the decimal point) is a series of ones and zeroes representing reciprocal powers of two (the term "floating point" is because the entire number is a string of ones and zeroes and a "floating point" identifying where the decimal point is).
So the number
0.625 can be presented exactly in binary as
1*1/2 + 0*1/4 + 1*1/8 (note those powers of two are multiplied by the corresponding binary digits).
Most numbers, however, can only be approximated with floating point numbers. Let's use an 8-bit machine for simplicity. The number
00011001. With reciprocal powers of two, that becomes
1/16 + 1/32 + 1/256, or
0.09765625. The number
Fractions: 1/8 + 1/16 + 1/128 + 1/256 Bits: 00110011 Result: 0.19921875
Fractions: 1/4 + 1/32 + 1/64 Bits: 01001100 Result: 0.296875
On a 32-bit machine,
Fractions: 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 + 1/131072 + 1/1048576 + 1/2097152 + 1/16777216 + 1/33554432 + 1/268435456 + 1/536870912 + 1/4294967296 Bits: 00011001100110011001100110011001 Result: 0.0999999998603016
So it's close, but no cigar. (If you're curious, you can calculate those yourself using a small program I provide in chapter 3 of my Beginning Perl book).
Floating point math is why many accounting systems internally use integers for numbers instead of floating points.
Since it's so easy to get floating point math wrong and developers are constantly struggling with this, one of the advantages that COBOL has is its use of packed decimals and internally representing numbers as base 10: running billions of dollars worth of calculations gives you the correct answer.
Note that these trivial math errors are properties of floating point numbers, not Perl:
$ ruby -e 'puts 0.1 + 0.2 - 0.3' 5.551115123125783e-17 $ python -c 'print .1 + .2 - .3' 5.55111512313e-17 $ echo "puts [expr .1+.2-.3]"|tclsh 5.551115123125783e-17
Note that the above means that if you get your code running on a 64-bit machine, someone on a 32-bit machine might very well get incorrect output if you're not careful!
Getting back to Perl 6:
$ perl6 -e 'say .1 + .2 - .3' 0 $ perl6 -e 'say 1/(.1 + .2 - .3)' # Divide by zero in method Numeric at ...
What? How is it doing that? Well, it's not using packed decimals. Instead, Perl 6 internally used rational numbers, each with numerators and a denominators. Let's drop into the REPL. In Perl 6, everything is an object and we can inspect them. In the session below, the
.WHAT method tells you what type something is. (Rat is short for Rational, but the term Rational is a role allowing other things to be Rational numbers.
Rat does Rational). The
.nude method on the
Rational role returns a two-element list of the numerator and denominator:
$ perl6 > say .3.WHAT (Rat) > say .3.numerator 3 > say .3.denominator 10 > say .3.nude.perl (3, 10)
That the above shows you is that Perl 6 uses rational numbers and basically does math with fractions. It does integer math with fractions. This means that the word size of your CPU doesn't really matter.
Here's another interesting example:
> say 3.1415927.nude.perl (31415927, 10000000)
PI, as you know, is an irrational number. Irrational numbers cannot be expressed as an integer ratio (the numbers after the decimal point go on forever), but instead of letting the vagaries of floating point math choose your imprecision, you get to choose your imprecision.
There are many, many other useful features of Perl 6, but this is a fantastic one. Given past comparisons of COBOL and Perl 5, I find it delightfully ironic that one of the few strengths of COBOL is matched in Perl 6.
Update: Yeah, a few nits fixed. Thanks for the free proofreading, folks :)
Trivia note: bytes are only 8 bits by modern convention. Historically they were not. See page 78 of Planning a Computer System, published in 1962:
The natural length of bytes varies. Decimal digits are most economically represented in a 4-bit code. The commonly used 6-bit alphanumeric codes are sufficient when decimal digits, a single-case alphabet, and a few special characters are to be represented. If this list is extended to a two-case alphabet and many more special characters, a 7- or 8-bit code becomes desirable (see Chap. 6). A 3-bit octal code or a 5-bit alphabetic code is occasionally useful. There would be little use for bytes larger than 8 bits. Even with the common 12-bit code for punched cards, the first processing step is translation to a more compact code by table look-up, and during this process each column is treated as a 12-bit binary field. There would be no direct processing of longer fields in the 12-bit code.