A follow up on three-value logic in Perl
So far the initial response to my three-value logic in Perl post has been great. Due to that response, on Reddit, Perlmonks, here and my RT queue has led to:
- Made sorting a bit more useful
- Contemplating removing stringification
- Plenty of strengthening of the documentation (including making it explicitly clear that the
unknown
logic is akin to SQL'sNULL
)
And then there's been some interesting rebuttals.
This is the snippet that I used to show that undef
is broken while unknown
offers a semantically correct guarantee:
foreach my $employee (@employees) {
if ( $employee->salary < $threshold ) {
increase_salary( $employee, 3_000 );
}
}
One person seemed to think my example was wrong and offered what they referred to as a "sane" version:
foreach my $employee (@employees) {
if ($employee->is_salaried) {
if ($employee->salary and $employee->salary < $threshold) {
increase_salary($employee,3000);
}
}
}
See the problem there? The is_salaried
predicate determines if the person has a salary and the $employee->salary
test determines if they have a salary set. This again shows the fragility of definedness because the dev forgot to check for it in the second if statement. It seems like it would be OK, but when you think about it, it's obvious that an undefined salary would be wrong, but what about a zero salary? That's less clear and thus, the above code is ambiguous (the problem is more stark if you're talking about prices and you're wondering about the difference between an item without a price and an item which is obviously free).
Another person posted this as a counter-example to my code:
if ( $salary < $threshold ) {
increase_salary( $employee, 3_000);
}
else {
decrease_salary( $employee, 3_000);
}
If $salary
is undef, we take the first branch, but if it's unknown
, we take the second branch. Ah ha! Gotcha! That's a bug!
Except that there's a very huge difference between the two. If you have an undef
value, you probably call something like this:
$employee->salary( $employee->salary + $difference );
What happens if the salary is undefined? When you try to manipulate an undefined value, it coerces into a zero, an empty string or a false value. For the example above, we would be adding $difference
to zero and probably corrupting our data. In fact, if this was the code:
$salary += $difference;
You wouldn't even get a warning about the potential data corruption and that's documented behavior! Surprise!
What happens if salary is unknown
? You'd probably be ultimately calling something like this:
$employee->salary( $employee->salary + $difference );
Except, as documented, that will confess()
. You get a useful error message and a stack trace. This is because unknown
values, unlike undef
values, are specifically designed for comparison, not manipulation. If you forget to check if a value is unknown
, it will confess()
rather than allowing you to corrupt the data.
The only argument I heard which has some merit was: what if you take a bad branch? With an unknown
value, you might silently take a different branch in your code and that branch may do something undesirable that doesn't impact the unknown
value. I have to admit that this is a problem, but I counter that this is a problem even if you use undef
values. The difference is that with an undef
value, you can enable fatal warnings. Perhaps I should look at doing that for unknown
? I'm unsure.
One thing I did that I may regret is that I overloaded stringification to print [unknown]
. I now think that may be a mistake (imagine embedding it in JSON). I think I should remove that and only allow boolean comparisons.
Three-value logic, like any logic, is not perfect, but I do think it maps better to the real world and solves more problems than it causes, but a few replies of various forums is hardly a compelling case. So far this has been an interesting experiment.
Go to your favorite projects and see about replacing undef
with unknown
and see what happens. It might be interesting. As always, you can download the latest version on github.
It's also been pointed out that salary()
should always be defined and that increase_salary()
should also check for sanity. The latter is definitely correct, but unknown
offers a useful fallback if devs forget. The former (salary is always defined) is desirable, but we can't always guarantee that we have this information because maybe:
- It's not applicable (volunteers don't have salaries)
- It's not known (new employee whose salary is not yet implemented)
- It's not available (the 'salary service' is down)
- It's restricted (you don't have permissions)
- It was fetched via an outer join (ooh! A common source of undef values!)
- Something else I haven't thought of
Do you really want to kill that six hour batch job after five hours because you're missing one salary? The smaller the system the easier it is to handle cases like that, but as you deal with extremely large-scale systems, it's often better to make them more intelligently handle the cases where you're missing data. That way, your code doesn't die and you can come back later and correct the missing data (if that's appropriate). Don't automatically assume that your "data must always be defined" belief is correct. Using words like "all" or "never" tends to be a code smell in reasoning.
JSON.pm doesn't look at stringy overloading.
If JSON.pm is asked to serialize a blessed object it will by default croak. If you use the convert_blessed option, then it will call the TO_JSON method to serialize blessed objects. unknown should probably provide a TO_JSON method for this purpose, perhaps returning undef.
Toby, while I sympathize with the idea, I don't want unknown values to do anything other than represent unknown data. If I add a special case for JSON, why not for YAML, or CSV, or XML, or ... ? :)
I apologize that this post will sound pessimistic. There just seem to be many "modern perl" solutions being presented in the past few years that are actually only waterbed solutions: that as you push on this problem over here - the problem just pops back up over there.
It's not applicable (volunteers don't have salaries)
You shouldn't be increasing volunteers' salaries. This should be excluded elsewhere via something like AND type != 'volunteer' or like $self->is_salaried.
It's not known (new employee whose salary is not yet implemented)
You shouldn't be increasing their salaries - this should be excluded elsewhere via something along the lines of AND salary IS NOT NULL.
It's not available (the 'salary service' is down)
That is exceptional, you shouldn't be trying to increase salaries at all - but in this case you're lucky because you can't - the salary service is down.
It's restricted (you don't have permissions)
You shouldn't be increasing their salary then (and trying to do so should fail).
It was fetched via an outer join (ooh! A common source of undef values!)
Then don't do that. You control the code - don't put yourself in bad data situations.
Something else I haven't thought of
Unknown may or may not behave better than undef in this situation. Assuming unknown will handle the new unthought of situation better than undef will bring you remorse later.
This was my reply to your original perlmonks:
Sometimes undef is null, sometimes undef isn't. Adding "unknown" doesn't constrain these situations any better than using undef deos. In any given case you either know how to handle undef or you don't. If you know how to handle it, you handle it. Sometimes it very well could be the right thing to do is add 3000 (maybe that is what initial salaries are - though doubtful). Most of the time, you write your code to handle exception states.
The addition of unknown as a value type is fine (I guess). But it is only really an additional exception state. You could guess that unknown will behave more validly than undef, but I would wager that if you did that then most of the time you would just be guessing. Really, if your value is unknown, and you want your system to be well defined, then you have to handle unknown values and not blindly attempt to process them. This is no different than how you safe handle undef values.
In all reality the new "unknown" is really just another flavor of undef and carries all of the same problems and benefits that undef does.
Incidentally, the behavior you outlined for unknown appears to be very Perl6 junction-like and carries a little of the same benefit, but all of the same baggage that junctions do. Junctions in a very short scope are easy enough to follow, but at a distance there is grave magic.
The various points you raise are value, but you've missed an extremely important use case:
Real-world code
In reality, what we should or should not be doing is often a matter of debate and sometimes systems in grow in unexpected ways and have surprising interactions, or we're facing a tight deadline and need to rush things into production.
One thing I've done is added a fatal option which makes all interactions fatal so if we have a case where a dread "undef" (unknown) shows up and we fail to catch it, we'll get a fatal error if we do anything at all with it.
That should satisfy the majority of use cases (I might even make that the default, but I'm not sure).
Maybe I'm not thinking this through carefully enough, but, since
undef
coerces to zero, couldn't the original problem be solved with:Also, is
unknown
analogous toNaN
(not a number)? Unlikeunknown
,NaN
compares false to everything, including itself.J David Eisenberg: that looks tempting and it might be right for some code. However testing for $employee->salary being greater than zero skips a use case: what if an employee's salary is legitimately at zero and they should have the increase?
Or we could skip the "salary" idea altogether and talk about any situation whereby something might need adjustment and undef gets coerced to zero when, in fact, it might not actually mean that. If a distinction is needed between undef and zero and it's not made, code can easily make bad assumptions.
The problem with ditching the stringification is that it could make debugging problematic.
I'm interested to see where this goes.
Try the "fatal" option. Stringification is ditched, but it gives you a nice stack trace so you can (hopefully) figure out how you got there.