PWC 060: Task #1, Excel Column & Task #2, Find Numbers
Convert back and forth between base 10 and base 26, no problem.
"Column A" equals 1? Problem.
Okay, as problems go, that is a small one. Let I, an old Fortran programmer who loves perl, be your guide between offset indexing and ordinal indexing. First of all, we should complete our assessment of the lay of the land:
"Column 1" equals 1? Problem.
Yes, both column numbering systems start at "one", not "zero", but the mechanical conversion between number bases requires modular arithmetic, which is very much related to offset (0-based) indexing.
Note the quotes? They are meant to emphasize the concept of "one-ness" and "zero-ness" so that I don't have to deal with lexicographic tautologists telling me that "A is A, 1 is 1." Work with me here: The core of difficulty in this task is that we are translating between two number systems that don't have a zero and zero is a big thing.
The general idea is to convert to "zero"-based counting, divide up the number in powers of N (here, either 10 or 26) and then convert back to "one"-based counting.
Lest it seems like the script proceeded without a hitch, I will reveal now that half of my time was spent in off-by-one hell on this. Only in retrospect did this narrative appear.
In all that follows, I use a translation table (hash) to convert between letters and their zero-based numerical order. If you are not as lazy as I am, you could go old school and do your calculations with
ord(), such as $zero_based_char_index = ord($char) - ord('A'). If you do go down that route, I would like to scare you with an old story, told by FORTRAN programmers when sitting around a campfire ... "EBCDIC."
From "1" to "A"
This tears down a number by storing the remainder, modulus 26, then replacing the number with the integer result of dividing itself by 26, then iterating. I would liked to have made this a one-liner, but couldn't think of a way to avoid using a temporary array to hold the remainders. The "- 1" terms you see are an indication that we are going from 1-based to 0-based counting in order to get the modulo and division operators to work out.
From "A" to "1"
A one-liner! Plus I get to use
reduce, which is a bonus. This is the opposite of the first subroutine in that it builds up instead of tears down. Mathematically, it reminds me of the nested multiplications of the FFT. The places where "1" is added means we are going from 0-based counting to 1-based counting, although we are not purely in the realm of 1-based counting until the 1 is added to the reduce at the very outer expression.
Input is on the command line and it just takes an arbitrary mix of columns in numeric or alphabetic (Excel) format. Zeros are stripped off of the front of numbers, to discourage those jokers who want to screw things up with octal, and all characters are upper-cased.
Output is exactly as in the Task description. Each single input is accounted for an its type is noted, "Number," or "Column Name," and then the converted result is printed as "Output:"
The crystalline perfection of using recursion and smart logic to avoid spending oodles of time creating bogus results that are thrown away was blemished by edge cases and I found myself dawdling. What do I do when I dawdle? I document. (Well, that's the plan, along with exercising and getting enough sleep every day.)
So this is a very polished script compared to what I've posted in the past. I won't talk much about whatever is in the POD and the INTERPRETATION section details how I ironed out the wrinkles in the spec. I used Getopt::Long and Pod::Usage because I like them as a baseline behavior for interface and documentation, not perfect, but good. The only drawback is that writing up a POD feels a lot like work. Ugh.
As to what's going on, I've already tipped my hand: Recursion. Thanks to the $Y constraint, I have to chop up the cases I have to handle, which makes for busy looking code.
Basically, we are given a bunch of "building block" number strings and have to produce all possible concatenations of these building blocks that achieve and exact length and turn out to be numerically less than a given number.
In the statement of the task, the upper bound constraint, $Y, is the bound for a strict inequality. In what follows, when I talk about the constraint or upper bound, I am talking about the maximum allowed value due to the constraint. Since $Y must be an integer, then this maximum allowed value is $Y - 1.
The recursion idea is to select a building block and assume that it is at the "head" of the current string. So now we just have to get the "rest of the current string"--aha, recurse! We have a few things to handle, though:
- The digit zero appears in the example as an allowable building block. This causes more than one gyration in the code, but the gyration we're concerned with here is that zero cannot be used as the ultimate lead character. In other words, I not only have a terminating case for the recursion, but I also have an initiating case for the recursion. One way to handle this is to just do the first pass exterior to the recursive function and then hand the rest off to the function. When I started down this road, it was like the guts of the entire function spilled out and now I had two copies of almost identical code in the script. Yuck! So I went with plan B: Use a flag in the argument list. (yuck.)
- If the building block you want to use matches the upper bound constraint exactly, then you have to tell the subsequent recursive call that it has to make sure it obeys the rest of the constraint. For instance, if your upper bound constraint is "1234" and your building block is "12" you have to pass along the upper bound constraint of "34" to the respective recursive function call.
- If the building block is "under" the constraint, then it tells the recursive call to "give me everything you've got" in terms of the smaller string to be returned. For instance, if your upper bound constraint is "1234" and your building block is "11" you pass along the constraint of "99" to the respective recursive function call.
- If zero were not allowed, then when the limit gets to zero in the recursion, we would be done. But if we have zero in our list of building blocks and if the user sets $Y to 1, then we have to return the value of 0. Ergo, we aren't all done when we have our limit set to zero, we are mostly done. So in addition to passing the limit in the recursion call, we also pass the length of the required result to the recursion. When the requested length is zero then we are done. Perhaps it would be possible to think of a way to avoid passing the requested length, since, once the option validation is done, the limit can also be set to imply the requested length, but I grew tired of this ... and documentation.
Describing code is worse than reading it. Fortunately for you, I'm pretty much out of steam.
Again, all input is on the command line. There is an --X option and a --Y option, both are as described in the task. The rest of the command line arguments are the number string building blocks.
A monster, comma-delimited list of all possible results, just like the example. They are sorted numerically for ease of checking the results.