PWC 058: Task #1, Compare Version & Task #2, Ordered Lineup
I usually am bored with string twisting (as opposed to bit twiddling) because there is a certain level of grunt work, even with such classics as ELIZA and FESTOON and I can easily fall down the rabbit hole of trying to get pluralization perfect. So when the exciting topic of version numbers came up, I would have passed, but we perfectionists have to finish one to get to two....
Plus there was one thing that I liked, the problem was set up as a comparison operator, like
cmp, suitable for use in a sort routine. Whomever you are out there, with a need to sort thousands of version strings via perl, this script is for you ;-)
Input is on the command line. The versions will be compared in pairs. Creating a version sort routine of command line arguments was considered, but I decided to emulate the problem statement example output instead.
When mimicking the functionality of
cmp one of the first things to consider is using the functionality of
cmp directly. Using the numerical
<=> comparison operator is one of my favorite things because I like non-alphabetic typography, I am a perl coder after all, but it bogs down into examining a lot of separate subversion positions with various edge cases. Using the lexicographic comparison operator,
cmp, allows a single comparison of the entire version string against another as long as a few details are handled:
- The version and subversion fields will sort numerically if they are the same length; therefore the version and subversion numbers must be left padded by zeros in their fields.
- The production and alpha versions as denoted by the delimiters '.' vs '_' will have to compare correctly. This is done by turning the delimiters into fields themselves, with '.' being given value 1 and '_' being given value 0.
- Comparing alphabetically yet keeping all the characters within the ten Hindu-Arabic numerals avoids worrying about what character encoding is being used. If your favorite encoding doesn't sort 0, 1, ..., 9 the standard way, you aren't really into this anyway, I'm assuming.
All the magic is in the
vercmp routine, with the first action taken being to validate the input. The first validation, that no extraneous characters are in the version strings is about the only one that is a must. The other validations flag a missing number at the front of the first delimiter, in between two adjacent delimiters, or at the end of the last delimiter as an error. This is a matter of taste. You could just assume that any "missing number" should just be considered equivalent to zero and I wouldn't argue with you, but being lazy, I just reject as many boundary cases as I plausibly can. You get an indefinite number of subversion numbers, if that makes you feel better.
Once the data is validated, everything quickly rolls downhill as I indicated:
- Change all delimiters in the original string: '.' becomes '.1.' and '_' becomes '.0.'
- Split the version strings on '.' and store each in its respective array.
- For each respective version/subversion field, zero pad the smaller string on the left to match the larger.
- For version strings that have fewer subversion fields than the version being compared against, supply the appropriate string of zeros to match the length of the field in the other version.
Then join the respective version arrays back together and use
cmp to produce the correct result.
Emulate the output format of the problem statement. Not much going on here other than minor accounting to keep all the comparisons lined up and looking nice.
- How do you eat an elephant?
- One bite at a time.
When faced with a list of heights, 2, 6, 4, 5, 1, 3, say, and a respective list of the number of "taller" heights appearing prior to a given height, 1, 0, 2, 0, 1, 2, say, where do you begin? With the shortest height, of course. (I said "of course" to distract you from asking how long it took me to think of this.)
In what follows, keep in mind that, since you are working with a list of unique heights, there will always be a smallest unique height in whatever subset of this list of heights you choose.
So you take a first bite out of this list of heights by removing the smallest one, what now? Well, that smallest height has a number of taller heights ahead of it. In the above example, the smallest height is 1 and the number of taller heights ahead of it in the output is 1. But every other height is taller than the smallest height. This means that there can only be one height in front of this smallest height, and this means that the smallest height in the list, 1, has to be in position 2 of the output list.
At this point, we are done, although I didn't realize it at first. We have reduced our input list by one and have determined exactly where that smallest input height goes in the output. In this problem, no height is concerned with heights that are smaller than it, only those that are taller. This means that the smallest height and the space it occupies is effectively ignored by the other heights in determining where they should go in the output list.
- For better understanding of how this works, the value for "taller" that is attached to each height should be understood as "The number of open slots in the output list ahead of where this height will be placed when it is its turn to be the smallest unique remaining element in the set of heights to be placed."
- A consequence of this reasoning is that the tallest height must always have an associated "taller" value of zero for there to be a valid ordering.
- Since the smallest element is always exactly determined in the output ordering and since every element in the list eventually becomes the smallest element, that implies that, for any list of heights and their respective taller values, there is at most one solution.
There are three command line options:
- Use the recursive method instead of the procedural one.
- List of heights to order
- Corresponding list of entries that state how many taller elements are ahead of the respective height entry
My first solution was procedural. I had variables to track the number of slots in front of the shortest element and I had variables to track the indices and I had variables to track the number of "taller heights" than my current height. Frankly, I just guessed at +1 or -1 for almost every variable definition but in the end I got it to work.
As I leaned back in my chair to enjoy task completion, I saw it immediately: I could do this task recursively. So I did it. The idea is to follow up on what I said above about "reducing the input list" which is rallying cry for a recursive approach if I've ever heard one. After getting this to work, I turned around and removed all the tracking variables from the procedural solution to make it look as slick as the recursion.
Great, I now had TWO solutions the code for each was spare and minimal and I was feeling twice as smart ... until I reread the requirements: I had to report an error if there was no solution. I tried putting in bogus heights and errors were reported: Errors in the code. That didn't look good.
Normally, with one solution I just keep throwing code in until everything works, but I had two solutions and didn't want to abandon one to focus on the other. I was too close to my work. I had to elevate the game further:
- Both the procedural solution and the recursive solution were wrapped up as functions that returned a function.
- Both the procedural solution and the recursive solution were modified (made uglier!) to allow impossible situations to expand the output list so that no run-time error or warning was thrown due to an impossible situation triggering the strict or warning pragmas.
- A wrapper function was created that examined the output of the ordered lineup process and looked for an undef element. The existence of an undef element in the output meant that the ordered lineup process couldn't keep within the required output list length. When this happened the wrapper script would return an empty array instead.
- Because the procedural process could potentially require a huge amount of space if given a huge "taller" value, another wrapper (dare I say, decorator?) function was created to return an empty array if an impossible "taller" value was found.
- And heck, if I was going to start wrapping functions within functions, I might as well make a wrapper function to handle error messaging and standard output.
If the solution exists, the array is just sent to standard out, space delimited. If you want to make things tidier, pipe the output to something like
xargs -n 16
And that is how I wound up with an overly polished few lines that achieves everything with wrapper functions. I promise, I started out with the best of intentions to just knock out the answer and get back to binge watching Netflix and Amazon.