Fighting a 30-year-old software bug

Er, an earlier version of this said the software bug was 20 years old. It's 30! Thanks, Sidhekin

Today, in reading some data, I processed 36,916 potential dates. Two of those 36,916 failed to validate. I wasn't terribly concerned as these dates come from client-supplied data and this data is often, um, surprising. However, when I inspected the raw data, it turns out that those dates were January 1st, 2011 and January 1st, 2007. I had a bug in software I wrote about a month ago, but it turns out that this bug is actually 30 years old.

For anyone who doesn't really understand the software ecosystem, this may sound mystifying, but it makes sense. Because of a decision taken a long time ago to make another company money, my $client lost money in paying me to fix a bug that one company accidentally introduced and another company deliberately introduced. But to explain it I need to talk about a third company that introduced a feature that eventually became a bug, and a few other historical tidbits that nonetheless contributed to the obscure bug I fixed today.

In the good ol' days, Apple computers would sometimes spontaneously reset their date to January 1st, 1904. The reason for this is fairly simple. Back then, Apple computers used battery-powered "system clocks" to keep track of the date and time. What happened when the battery ran out? Apple computers tracked their dates as the number of seconds since the epoch. In this sense, an epoch is merely a reference date from which we start counting and for Macintosh computers, that epoch was January 1st, 1904 and when the system clock battery died, that was your new date. But why did that really happen?

Back then, Apple used 32 bits (ones and zeros) to store the number of seconds from their start date. One bit can hold one of two values, 0 or 1. Two bits can hold one of four values, 00, 01, 10, 11. Three bits can hold one of eight values, 000, 001, 010, 011, 100, 101, 110, 111, and so on. How much can 32 bits hold? 32 bits can hold one of 232, or 4,294,967,296, values. For Apple dates that was approximately 136 years, which is why older Macs couldn't handle dates after 2040 and if your system clock battery died, your date would reset to 0 seconds after the epoch and you'd have to keep manually resetting the date every time you turned on your computer (or until you bought a new battery for your system clock).

However, the Apple solution of storing dates as the number of seconds after the epoch means we couldn't handle dates before the epoch and that had far-reaching implications, as we'll see. This was a feature, not a bug, that Apple introduced. It meant, amongst other things, that the Macintosh operating system was generally immune to the Y2K bug (though many Mac apps weren't because they would introduce their own date system to work around the Mac limitations).

Moving along, we have Lotus 1-2-3, IBM's "killer app" that helped to launch the PC revolution, though it was VisiCalc on the Apple that really launched the personal computer. It's fair to say that if 1-2-3 hadn't come along, PCs would likely have not taken off the way they had and computer technology would have turned out considerably differently. However, Lotus 1-2-3 incorrectly reported 1900 as a leap year. When Microsoft released Multiplan, their first spreadsheet program, it didn't have much market penetration. So when they conceived of Excel, they decided to not only copy 1-2-3's row/column naming scheme, they made it bug-for-bug compatible, including deliberately treating 1900 as a leap year, a problem that remains to this day. So for 1-2-3, this was a bug, but for Excel, it was a feature to guarantee that everyone who used 1-2-3 could import their spreadsheets into Excel with no differences in the data, even if the data were wrong.

Eventually Microsoft decided to release a version of Excel for Apple's Macintosh computers, but they had a problem. As mentioned, Macintosh didn't recognize dates prior to January 1st, 1904. However, Excel used January 1st, 1900 as its epoch. So Excel was modified to recognize what the epoch was and internally stored dates relative to these respective epochs. This Microsoft support article explains the problem fairly clearly. And that leads to my bug.

My current $client receive spreadsheets from many customers. Those spreadsheets may have been produced on Windows, but they may have been produced on a Mac. As a result, the "epoch" date for the spreadsheets might be January 1st, 1900 or January 1st, 1904. How do you know which one? Well, the Excel file format exposes this information, but the parser I am using doesn't and it expects you to know whether you have a 1900 or 1904-based spreadsheet. I suppose I should have spent a lot of time trying to figure out how to read the binary format of Excel and sent a patch to the maintainer of the parser, but I have many other things to do for $client and so I quickly wrote a heuristic to determine whether or not a given spreadsheet was 1900 or 1904. It was pretty simple.

In Excel, you may have a date of July 5, 1998, but it might be formatted as "07-05-98" (the useless US system), "Jul 5, 98", "July 5, 1998", "5-Jul-98" or any of a number of other useless formats (ironically, the one format my version of Excel doesn't offer is the standard ISO 8601 format). Internally, however, the unformatted value is either "35981", for the 1900 date system, or "34519", for the 1904 system (these numbers represent the number of days after the epoch). So what I do is use a fairly robust date parser to extract the year from the formatted date, and then an Excel date parser to extract the year from the unformatted value. If they're four years apart, I know I'm using the 1904 date system.

So why don't I simply use the formatted date? Because July 5, 1998 might be formatted as "July, 98", losing me the day of the month. We get our spreadsheets from so many companies and they create them in so many different ways that they expect us (meaning me, in this case) to figure it out. After all, Excel gets it right, I should, too!

That's when 39082 kicked me in the tail. Remember how Lotus 1-2-3 considered 1900 a leap year and how that was faithfully copied to Excel? Because it adds an extra day to 1900, many date calculation functions relying on this can easily be off by a day. That means that 39082 might be January 1st, 2011 (on Macs), or it might be December 31st, 2006 (on Windows). If my "year parser" extracts 2011 from the formatted value, well, that's great. But since the Excel parser doesn't know whether it's a 1900 or 1904 date system, it defaults to the common 1900 date system, returns 2006 as the year, my software sees that the years are five years apart assumes an error, logs it, and returns the unformatted value.

To work around this, I now have the following (pseudo-code):

difference = formatted_year - parsed_year
if ( 0 == difference )
    assume 1900 date system
if ( 4 == difference )
    assume 1904 date system
if ( 5 == difference and parsed month is December and parsed day is 31 )
    assume 1904 date system # yeah, I had 1900 originally. Thanks for spotting that

And now all 36,916 dates parse correctly.

Note: for extra fun, if you have a Mac and you're running Excel, try entering a date before 1904 and then formatting it into a different date format. You can enter the date, but you can't format it because Excel will accept it, but will treat it as a text field. Meanwhile, for Microsoft Excel, all days of the week prior to March 1st, 1900 are wrong, due to a bug in software released in January of 1983.

Update: It was pointed out that Spreadsheet::ParseExcel does parse Excel's 1904 flag. Unfortunately, I'm using Spreadsheet::ParseExcel::Stream, which does not. Even on beefy boxes, we've run out of memory with the using the standard parser so we've had to fall back on the streaming parser. So far every attempt I've made to work around this has hit yet another bug.

Update 2: turns out Microsoft released Excel for Macintosh first!

Update 3: And according to an anecdote from Joel Spolsky, the Lotus 1-2-3 "bug" may have been a deliberate attempt to simplify the Lotus software. I had previously read hints that Lotus did this deliberately, but since I couldn't say for sure, I left it out.

21 Comments

I like the yoda code:
if ( 0 == difference )

We stopped using the Spreadsheet::ParseExcel family of modules for reading Excel files because of similar obscure bugs. We had much better luck with using Excel directly via Win32::OLE.

As much as I would like to have a native Perl module for the task, using Excel itself is much more stable. Adopting a famous Perl quote, I'd go as far as to say "Only Excel can parse Excel" :)

Or you could, you know, use a modern compiler.

Shouldn't your pseudo code read:

if ( 5 == difference and parsed month is December and parsed day is 31 )
    assume 1904 date system

Very satisfying explanation, Ovid. I have seen people do this a lot and always wondered why. I've fallen victim to assignment instead of comparison in the past, I think I'll adopt this style now.

Yeah it's true, but it's bad technique. It makes code harder to read, and any good compiler should be able to warn you when you do if (difference = 4)

Loving the way that in the headline of this article all about incorrectly parsing dates, you calculated 1983 as 20 years ago. :P

why url is "fighting-20-year-old-software-bugs.html"...

Right, he called it yoda code because the name for the technique you used

Wouldn't any decent compiler throw a warning these days? An assignment rather than an equality in a conditional is pretty easily staticly checked.

"warning C4706: assignment within conditional expression"

Thanks modern compiler.

So if this whole issue can be fixed by the parser checking a flag in the file, wouldn't a more sane solution be to open a bug report, patch the lib, and hopefully get the maintainers to merge the patch into master?

I feel your pain brother! One of my first jobs was doing data conversions for Veterinarian software, in the late 90's early 00's

Our data came from running the program, capturing printed reports as text OR analyzing the softwares file format and actually reading it directly! nothing was ever the same.

One of my biggest wins was on an Alpha Micro system the software was using this strange floating point number system in 3 bytes, I could get it to parse at all! I realized that it vaguely looked like the format of those 6 byte pascal reals. I shoved the bytes into the "proper" place and bingo! It came back with $3.50.

No trolling or whatever! When that episode of South Park came out I about die, $3.50 was my lochness monster too!

Wouldn't Yoda say "if (0 difference ==) ..." ? j/k

Really enjoyed the article -- time and calendar stuff is remarkably slippery and a source of endless fun & games ...

How about converting the .xls files to .xlsx or XML and then parse them as XML?

At Sorcim/Computer Associates we created the SuperCalc spreadsheet modeled 1st on VisiCalc then on Lotus 123. I'm pretty sure we copied the 123 date bug since we wanted to run 123 macros...

Yes!! You have PROVEN that no one REALLY knows when July 5, 1998 will occur. I hope you have reserved your place on the pleasure saucers from Planet X, because the world ends tomorrow and you may die!

Did you manage to fix this bug or its still alive?

Wow, great article, Didn't know that PERL is so old :)

Leave a comment

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/