A follow up on three-value logic in Perl

So far the initial response to my three-value logic in Perl post has been great. Due to that response, on Reddit, Perlmonks, here and my RT queue has led to:

  • Made sorting a bit more useful
  • Contemplating removing stringification
  • Plenty of strengthening of the documentation (including making it explicitly clear that the unknown logic is akin to SQL's NULL)

And then there's been some interesting rebuttals.

This is the snippet that I used to show that undef is broken while unknown offers a semantically correct guarantee:

foreach my $employee (@employees) {
    if ( $employee->salary < $threshold ) {
        increase_salary( $employee, 3_000 );
    }
}

One person seemed to think my example was wrong and offered what they referred to as a "sane" version:

foreach my $employee (@employees) {
    if ($employee->is_salaried) {
        if ($employee->salary and $employee->salary < $threshold) {
            increase_salary($employee,3000);
        }
    }
}

See the problem there? The is_salaried predicate determines if the person has a salary and the $employee->salary test determines if they have a salary set. This again shows the fragility of definedness because the dev forgot to check for it in the second if statement. It seems like it would be OK, but when you think about it, it's obvious that an undefined salary would be wrong, but what about a zero salary? That's less clear and thus, the above code is ambiguous (the problem is more stark if you're talking about prices and you're wondering about the difference between an item without a price and an item which is obviously free).

Another person posted this as a counter-example to my code:

if ( $salary < $threshold ) {
    increase_salary( $employee, 3_000);
}
else {
    decrease_salary( $employee, 3_000);
}

If $salary is undef, we take the first branch, but if it's unknown, we take the second branch. Ah ha! Gotcha! That's a bug!

Except that there's a very huge difference between the two. If you have an undef value, you probably call something like this:

$employee->salary( $employee->salary + $difference );

What happens if the salary is undefined? When you try to manipulate an undefined value, it coerces into a zero, an empty string or a false value. For the example above, we would be adding $difference to zero and probably corrupting our data. In fact, if this was the code:

$salary += $difference;

You wouldn't even get a warning about the potential data corruption and that's documented behavior! Surprise!

What happens if salary is unknown? You'd probably be ultimately calling something like this:

$employee->salary( $employee->salary + $difference );

Except, as documented, that will confess(). You get a useful error message and a stack trace. This is because unknown values, unlike undef values, are specifically designed for comparison, not manipulation. If you forget to check if a value is unknown, it will confess() rather than allowing you to corrupt the data.

The only argument I heard which has some merit was: what if you take a bad branch? With an unknown value, you might silently take a different branch in your code and that branch may do something undesirable that doesn't impact the unknown value. I have to admit that this is a problem, but I counter that this is a problem even if you use undef values. The difference is that with an undef value, you can enable fatal warnings. Perhaps I should look at doing that for unknown? I'm unsure.

One thing I did that I may regret is that I overloaded stringification to print [unknown]. I now think that may be a mistake (imagine embedding it in JSON). I think I should remove that and only allow boolean comparisons.

Three-value logic, like any logic, is not perfect, but I do think it maps better to the real world and solves more problems than it causes, but a few replies of various forums is hardly a compelling case. So far this has been an interesting experiment.

Go to your favorite projects and see about replacing undef with unknown and see what happens. It might be interesting. As always, you can download the latest version on github.

It's also been pointed out that salary() should always be defined and that increase_salary() should also check for sanity. The latter is definitely correct, but unknown offers a useful fallback if devs forget. The former (salary is always defined) is desirable, but we can't always guarantee that we have this information because maybe:

  • It's not applicable (volunteers don't have salaries)
  • It's not known (new employee whose salary is not yet implemented)
  • It's not available (the 'salary service' is down)
  • It's restricted (you don't have permissions)
  • It was fetched via an outer join (ooh! A common source of undef values!)
  • Something else I haven't thought of

Do you really want to kill that six hour batch job after five hours because you're missing one salary? The smaller the system the easier it is to handle cases like that, but as you deal with extremely large-scale systems, it's often better to make them more intelligently handle the cases where you're missing data. That way, your code doesn't die and you can come back later and correct the missing data (if that's appropriate). Don't automatically assume that your "data must always be defined" belief is correct. Using words like "all" or "never" tends to be a code smell in reasoning.

8 Comments

JSON.pm doesn't look at stringy overloading.

If JSON.pm is asked to serialize a blessed object it will by default croak. If you use the convert_blessed option, then it will call the TO_JSON method to serialize blessed objects. unknown should probably provide a TO_JSON method for this purpose, perhaps returning undef.

I apologize that this post will sound pessimistic. There just seem to be many "modern perl" solutions being presented in the past few years that are actually only waterbed solutions: that as you push on this problem over here - the problem just pops back up over there.

It's not applicable (volunteers don't have salaries)
You shouldn't be increasing volunteers' salaries. This should be excluded elsewhere via something like AND type != 'volunteer' or like $self->is_salaried.

It's not known (new employee whose salary is not yet implemented)
You shouldn't be increasing their salaries - this should be excluded elsewhere via something along the lines of AND salary IS NOT NULL.

It's not available (the 'salary service' is down)
That is exceptional, you shouldn't be trying to increase salaries at all - but in this case you're lucky because you can't - the salary service is down.

It's restricted (you don't have permissions)
You shouldn't be increasing their salary then (and trying to do so should fail).

It was fetched via an outer join (ooh! A common source of undef values!)
Then don't do that. You control the code - don't put yourself in bad data situations.

Something else I haven't thought of
Unknown may or may not behave better than undef in this situation. Assuming unknown will handle the new unthought of situation better than undef will bring you remorse later.

This was my reply to your original perlmonks:
Sometimes undef is null, sometimes undef isn't. Adding "unknown" doesn't constrain these situations any better than using undef deos. In any given case you either know how to handle undef or you don't. If you know how to handle it, you handle it. Sometimes it very well could be the right thing to do is add 3000 (maybe that is what initial salaries are - though doubtful). Most of the time, you write your code to handle exception states.

The addition of unknown as a value type is fine (I guess). But it is only really an additional exception state. You could guess that unknown will behave more validly than undef, but I would wager that if you did that then most of the time you would just be guessing. Really, if your value is unknown, and you want your system to be well defined, then you have to handle unknown values and not blindly attempt to process them. This is no different than how you safe handle undef values.

In all reality the new "unknown" is really just another flavor of undef and carries all of the same problems and benefits that undef does.

Incidentally, the behavior you outlined for unknown appears to be very Perl6 junction-like and carries a little of the same benefit, but all of the same baggage that junctions do. Junctions in a very short scope are easy enough to follow, but at a distance there is grave magic.

Maybe I'm not thinking this through carefully enough, but, since undef coerces to zero, couldn't the original problem be solved with:

if ( $employee->salary > 0 &&
 $employee->salary < $threshold )

Also, is unknown analogous to NaN (not a number)? Unlike unknown, NaN compares false to everything, including itself.

The problem with ditching the stringification is that it could make debugging problematic.

I'm interested to see where this goes.

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.