A little thing to love about Perl 6 (and COBOL)

By now you've heard the announcement that the Perl 6 team is cautiously hopeful that Perl 6.0.0 will be released this year. There are three things they need to finish:

  • The Great List Refactor (which should improve performance)
  • Native Shaped Arrays (tell Perl 6 that you only have 10 elements and you'll get an exception if you add more)
  • Normalized Form Graphemes (solves some issues with combining characters in Unicode, such as the little-known "tapeworm operator" "\x{1F4A9}\x{0327}")

With those, Perl 6 will truly be born. Of course, it will need better documentation, tooling support, better modules, and so on.

From what I can see, Perl 6 actually has a chance to take off and there's one delightful little feature that I want to talk about.

But long before I get to Perl 6, I want to explain why I like COBOL. Er, no, not really, but COBOL has something you've probably never heard of: packed decimals. This is a storage format for numbers that mainframes often handle natively. When mainframes first came to businesses, they were heavily used in accounting (well, they still are) and numbers were base 10, not base 2 (internally they're base 2, but we usually display them in base 10). We don't used packed decimals. We use floating point numbers and that's a problem, but I'll get to that in a bit.

In COBOL, you might declare a signed number named TOTAL as having 4 digits before the decimal point and 2 digits after like this: 01 TOTAL PIC S9(4)V9(2) COMP-3.

Skipping the detail about what that means, the numbers are stored as packed decimals. Each digit is one nybble (4 bits) and two digits are one byte. In the above, the two digits after the decimal point were held in a single byte and if you needed to add a list of packed decimal numbers, the mainframe would internally use an "Add Packed" instruction. This was not floating point math. And it was base 10, not base 2. This means that in COBOL, if you compute .1 + .2 - .3, the answer you get back is 0 (zero).

What is that in Perl 5?

$ perl -E 'say .1 + .2 - .3'
5.55111512312578e-17

# 0.0000000000000000555111512312578

That's a very small number, but it's not zero. Maybe you think that's OK, but it's not. Zero is special and you can't screw it up. Here's "one" divided by "zero":

$ perl -E 'say 1/(.1 + .2 - .3)'
1.8014398509482e+16

# 18,014,398,509,482,000
# or roughly 18 quadrillian

18 quadrillian is an exceptional number, but it's not an exception. If you multiply the mass of the sun by this "zero" you have 110 trillion kilograms left over, or roughly the mass of Mount Everest.

So why isn't .1 + .2 - .3 zero? Because floating points.

In floating point math, numbers are base 2, not base 10. The mantissa (the number after the decimal point) is a series of ones and zeroes representing reciprocal powers of two (the term "floating point" is because the entire number is a string of ones and zeroes and a "floating point" identifying where the decimal point is).

So the number 0.625 can be presented exactly in binary as 101. That's 1*1/2 + 0*1/4 + 1*1/8 (note those powers of two are multiplied by the corresponding binary digits).

Most numbers, however, can only be approximated with floating point numbers. Let's use an 8-bit machine for simplicity. The number .1 is 00011001. With reciprocal powers of two, that becomes 1/16 + 1/32 + 1/256, or 0.09765625. The number .2:

Fractions: 1/8 + 1/16 + 1/128 + 1/256
Bits: 00110011
Result: 0.19921875

And .3 is:

Fractions: 1/4 + 1/32 + 1/64
Bits: 01001100
Result: 0.296875

On a 32-bit machine, .1 is:

Fractions: 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 
     + 1/131072 + 1/1048576 + 1/2097152 + 1/16777216 + 1/33554432
     + 1/268435456 + 1/536870912 + 1/4294967296
Bits: 00011001100110011001100110011001
Result: 0.0999999998603016

So it's close, but no cigar. (If you're curious, you can calculate those yourself using a small program I provide in chapter 3 of my Beginning Perl book).

Floating point math is why many accounting systems internally use integers for numbers instead of floating points.

Since it's so easy to get floating point math wrong and developers are constantly struggling with this, one of the advantages that COBOL has is its use of packed decimals and internally representing numbers as base 10: running billions of dollars worth of calculations gives you the correct answer.

Note that these trivial math errors are properties of floating point numbers, not Perl:

$ ruby -e 'puts 0.1 + 0.2 - 0.3'
5.551115123125783e-17
$ python -c 'print .1 + .2 - .3'
5.55111512313e-17
$ echo "puts [expr .1+.2-.3]"|tclsh
5.551115123125783e-17

Note that the above means that if you get your code running on a 64-bit machine, someone on a 32-bit machine might very well get incorrect output if you're not careful!

Getting back to Perl 6:

$ perl6 -e 'say .1 + .2 - .3'
0
$ perl6 -e 'say 1/(.1 + .2 - .3)'
# Divide by zero in method Numeric at ...

What? How is it doing that? Well, it's not using packed decimals. Instead, Perl 6 internally used rational numbers, each with numerators and a denominators. Let's drop into the REPL. In Perl 6, everything is an object and we can inspect them. In the session below, the .WHAT method tells you what type something is. (Rat is short for Rational, but the term Rational is a role allowing other things to be Rational numbers. Rat does Rational). The .nude method on the Rational role returns a two-element list of the numerator and denominator:

$ perl6
> say .3.WHAT
(Rat)
> say .3.numerator
3
> say .3.denominator
10
> say .3.nude.perl
(3, 10)

That the above shows you is that Perl 6 uses rational numbers and basically does math with fractions. It does integer math with fractions. This means that the word size of your CPU doesn't really matter.

Here's another interesting example:

> say 3.1415927.nude.perl
(31415927, 10000000)

PI, as you know, is an irrational number. Irrational numbers cannot be expressed as an integer ratio (the numbers after the decimal point go on forever), but instead of letting the vagaries of floating point math choose your imprecision, you get to choose your imprecision.

There are many, many other useful features of Perl 6, but this is a fantastic one. Given past comparisons of COBOL and Perl 5, I find it delightfully ironic that one of the few strengths of COBOL is matched in Perl 6.

You can learn more about Perl 6 here.

Update: Yeah, a few nits fixed. Thanks for the free proofreading, folks :)


Trivia note: bytes are only 8 bits by modern convention. Historically they were not. See page 78 of Planning a Computer System, published in 1962:

The natural length of bytes varies. Decimal digits are most economically represented in a 4-bit code. The commonly used 6-bit alphanumeric codes are sufficient when decimal digits, a single-case alphabet, and a few special characters are to be represented. If this list is extended to a two-case alphabet and many more special characters, a 7- or 8-bit code becomes desirable (see Chap. 6). A 3-bit octal code or a 5-bit alphabetic code is occasionally useful. There would be little use for bytes larger than 8 bits. Even with the common 12-bit code for punched cards, the first processing step is translation to a more compact code by table look-up, and during this process each column is treated as a 12-bit binary field. There would be no direct processing of longer fields in the 12-bit code.

12 Comments

Nitpick edit: "So why isn't .1 + .2 + .3 zero? Because floating points" The 2nd "+" should be a "-" The source code above that is correct, just looks like a copy-pasta bug :-)

I don't usually nitpick on grammar, but this sentence caused my brain to come to a screeching halt:

But long before I get to Perl 6, I want to explain why I like is COBOL.

Another IBM language of similar vintage is RPG (Report Generator) and it also uses packed decimals.

The control flow is: - Run a set of zero or more statements once at the beginning - Run a set of statements once per input record - Run a set of zero or more statements once at the end

I was always struck by the similarity to: BEGIN { ... } while (<>) { ... } END{ ... }

Excellent article, I haven't heard much about Perl 6 but this is the best thing so far. Small correction, there's a typo the second time you say ".1 + .2 - .3" ("so why isn't..."). It says ".1 + .2 + .3".

Nice article, goot to know about the anatomy of rationals in Perl 6. A little nitpick though: the precision of floating point numbers does not depend on whether it is computed on 32-bit platform or 64-bit one. It is typically handled by 64-bit IEEE 754 floating point implementation, which is available om both 64-bit and 32 bit systems. There can be some differences between FP hardware vendors (ppc vs intel), but it is related to a difference in internal fp hardware precision (64 vs 80 bits), not the native architecture register size.

Minor typo:

So why isn't .1 + .2 + .3 zero? Because floating points.

I learned Perl, COBOL and Mainframes at University of Technology. I know how compilers works. For most of article it was useless for me, but I found it nice, because others can learn something new. But when I saw this:

"Note that these trivial math errors are properties of floating point numbers, not Perl"

I made facepalm. If Perl uses floating points only for fractions then it's Perl problem! In .NET or Java you can decide if you need floats or decimals. The point of article is: Perl introduced decimals (and probably any base caclulations as well, but it's focused on base 10 in this article - but it's fractions then, not decimals).

I was going to say something similar, but with the caveat that it depends on language, compiler, etc.

My perl5 -V says among other things doublesize=8 longdblsize=16 and nvsize=8. Meaning I'm using 64-bit floating point numbers, but my compiler (and perl's configuration) knows about 128-bit floating point numbers.

I've honestly not heard may people clamor for 128-bit floats the way that 64-bit integers were. I suppose there probably are some fields where the absurdly large or absurdly small numbers allowed by the re-doubling of the precision are useful. But I've not lately found my self calculating inter-stealer trajectories, or the mass of leptons.

Scientists constantly fight against floating point errors. Papers have been retracted from top journals because of it.

So the number 0.625 can be presented exactly in binary as 101.

I would alter that to be '.101' to indicate that it's fractional, or to the right of the decimal point.

A lot of work has been done on decimal floating point, both software and hardware. Not surprisingly this has been implemented on recent mainframes and presumably used by recent COBOL releases.

See http://en.wikipedia.org/wiki/Mike_Cowlishaw especially ref Decimal Arithmetic

And also http://speleotrove.com/decimal/

One small spelling error: quadrillion, not quadrillian.

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/