Three-value logic in Perl
Bad news. You've a brand-new CEO and he has a reputation for having a short temper. He knows about his reputation so he's decided to win over the employees by offering all "underpaid" employees a salary increase of $3,000 per year. You've been tasked to write the code. Fortunately, it's fairly straight-forward.
foreach my $employee (@employees) {
if ( $employee->salary < $threshold ) {
increase_salary( $employee, 3_000 );
}
}
Congratulations. You just got fired and have to find a new job. Here's what went wrong and a new way to make sure it doesn't happen again.
[Side note: if you really are looking for a job and want a job in Europe, drop me a private email]
You just gave all of your unpaid interns a $3,000 salary. You just gave all of your volunteers a $3,000 salary. You just gave all of your hourly workers a $3,000 salary on top of their hourly wage.
You didn't get fired for that. You got fired because the brand new CEO didn't have his salary entered into the system yet and when he was notified of a $3,000 salary, his short-temper kicked in and he demanded that the incompetent programmer be fired.
In retrospect, it's obvious what happened: a bunch of $employee
objects returned an undefined salary and that evaluated to zero for the numeric comparison. You probably got a bunch of warnings (assuming you enabled them) and you should have written your code like this:
foreach my $employee (@employees) {
next unless defined $employee->salary;
if ( $employee->salary < $threshold ) {
increase_salary( $employee, 3_000 );
}
}
Of course, you hate calling that salary method twice:
foreach my $employee (@employees) {
my $salary = $employee->salary;
next unless defined $salary;
if ( $salary < $threshold ) {
increase_salary( $employee, 3_000 );
}
}
Or, um, here's a curiosity: maybe in this code the $threshold
varies and is sometimes undefined. In this example you won't have a bug if $threshold
is undefined like you would if $salary
is undefined. Isn't that annoying? Sometimes undef
will cause a bug and sometimes it won't. It will, however, cause a warning in both spots. So the programmer who didn't get fired rewrote it like this:
if ( defined $threshold ) {
foreach my $employee (@employees) {
my $salary = $employee->salary;
next unless defined $salary;
if ( $salary < $threshold ) {
increase_salary( $employee, 3_000
}
}
}
How did your nice, simple foreach
loop turn into a monstrosity like that? All of us have seen this sort of cruft before. The original code was a pure expression of the business logic involved, but you've had to wrap it up in a bunch of structural code to work around the limitations of the language. Wouldn't it be nice if you didn't have to do that?
Well, now you can. In the employee package, you use
Unknown::Values and you set the default value of salary
to unknown
:
package Employee;
use Moose;
use Unknown::Values;
# ...
has 'salary' => (
is => 'rw',
default => sub { unknown },
);
Basically, anywhere where you would have used an undef
value, you now use unknown
instead. And how do you change your original foreach
loop? You don't:
foreach my $employee (@employees) {
if ( $employee->salary < $threshold ) {
increase_salary( $employee, 3_000 );
}
}
That works unchanged because regardless of what the $threshold
is, we can't know if the unknown
value is less than it, so we return false. In fact, not only does <
return false, so does ==
and >
(and just about any other comparison operator that you can reasonably think of). Furthermore, you don't get warnings for comparing unknown
values to other values because that's what unknown
is designed to do! Getting the unknown salaries is easy:
use Unknown::Values;
my @unknown = grep { is_unknown( $_->salary ) } @employees;
I've implemented this with Kleene's three-valued logic which is a clean, straight-forward way of handling "true", "false" and "maybe/unknown". In this logic, the negation of unknown
is unknown
. Logical and
is as follows:
true && unknown is unknown
false && unknown is false
unknown && unknown is unknown
You can reason your way through this by thinking "true and false is false", but "true and true is true", thus, "true and unknown cannot be known". However, "false and either true or false must be false", so "false and unknown must be false".
Logical or
is as follows:
true || unknown is true
false || unknown is unknown
unknown || unknown is unknown
Again, you can reason your way through those in a similar way.
Here's another example:
use Unknown::Values;
my $value = unknown;
my @array = ( 1, 2, 3, $value, 4, 5 );
my @less = grep { $_ < 4 } @array; # assigns (1,2,3)
my @greater = grep { $_ > 3 } @array; # assigns (4,5)
As you can see, neither @less
nor @greater
will ever contain an unknown value. Sorting is handled by not sorting unknown
values:
my @sorted = sort { $a <=> $b } ( 4, 1, unknown, 5, unknown, unknown, 7 );
eq_or_diff \@sorted, [ 1, 4, unknown, 5, unknown, unknown, 7 ],
'Sorting unknown values should leave their position in the list unchanged';
With Unknown::Values
, anything involving non-boolean behavior (except printing) will die with a stack trace:
% re.pl
$ use Unknown::Values;
$ my $value = unknown;
[unknown]
$ print $value + 3;
Runtime error: Math cannot be performed on unknown values at lib/Unknown/Values/Instance.pm line 43.
... rest of the stack trace
Stringification is overload to return [unknown]
, but bit manipulation, string concatenation, dereferencing or other operations which make no sense will immediately result in a fatal error. If you want to avoid that, simply check if you have an unknown value or set a defined default:
my $value = 0;
# later
$value += 3;
Or:
my $value = unknown;
# later
if ( is_unknown $value ) {
croak("Our value was never set");
}
$value +=3;
This has interesting consequences. The following looks like a bug, but it's not:
my $value = unknown;
$value ||= 2; # //= also fails
print $value + 3; # fatal error
This is because with an unknown value, you cannot know if it's undefined or false. Thus, the //=
and ||=
assignments will fail. You might think of this as a limitation, instead. Maybe I'll see if I can work around it later.
Future Work?
Fasten your seat belts because you finished the easy part and the hard part is coming up.
Currently the following always returns false if the two values are unknown
:
if ( $value1 eq $value2 ) {
... never happens if both are unknown
}
Why isn't one unknown
equal to another unknown
? Because their values are both unknown and you don't know if they're the same. However, what if you do this?
my $value = unknown;
my $value2 = $value;
if ( $value2 == $value ) {
...
}
Logically, even though both of those values are unknown
, they're the same unknown
and thus the comparison should succeed.
One possible enhancement is to ensure that an unknown
return a sequentially different
unknown
and thus allow me to say that an unknown
is equal to itself but not
equal to other unknowns. (Sort of like Rumsfeld's "known unknowns" and "unknown unknowns".) This sounds strange, but it means this would work:
my $value1 = unknown;
my $value2 = $value1;
if ( $value1 == $value2 ) {
... always true because it's an instance of a *single* unknown
}
But that gets confusing because we then have this:
if ( $value1 == unknown ) {
... always false because unknown generates a new unknown
}
So an unknown sometimes equals unknowns and sometimes doesn't. It only matches
an unknown
if it's itself. On the surface this actually seems to be correct,
except that we then have this:
if ( ( 6 != $value ) == ( 7 != $value ) ) {
... always false
}
That has to be false because 6 != $value
must return a unknown
and 7 != $value
should return a different unknown
and their cascaded
unknown
value cannot match. However, the following must be true:
if ( ( 6 != $value ) == ( 6 != $value ) ) {
... always true!
}
Because 6 != $value
should always return the same unknown
. Here's
why. We assume, for the sake of argument, that the unknown $value
has a
value, but we don't know it. Let's say that value is 4. The above reduces to
this:
if ( ( 6 != 4 ) == ( 6 != 4 ) ) {
Since 6 != 4
is true, we get this:
if ( 1 == 1 ) {
Ah, but what if $value
's hidden value was actually 6? Then we get this:
if ( ( 6 != 6 ) == ( 6 != 6 ) ) {
Since 6 != 6
is false, we get this:
if ( 0 == 0 ) {
In other words, there's a lot of interesting things we could do, but this would likely involve a fair amount of work breaking out the code for each and every operator and ensuring that it's handled correctly. I have an idea of how we could make this work. This would also let us get better error messages, but for now, I'm happy with this first pass and hoping that it will help eliminate a large class of common errors in Perl.
More importantly, if you use unknown
instead of undef
, a lot of the structural code where you check for a defined
value goes away and the resulting code is both clearer and more correct. Hooray for writing code that states the problem of the business rather than the problems of the programming language!
...I fear thought this concept may seem pretty complicated to grasp and as a result - not used.
Krasimir: perhaps I should point out that this is the same logic as the SQL NULL? Once you realize that, it's pretty simple (except that I don't believe the SQL NULL handles the cascading unknown comparison I have listed).
Well, whoever made the increase_salary() method respond for interns and volunteers should be fired as well. :)
Although this is beside your point about the known values, people don't use object oriented programming well anyway. Using Moose to let you define accessors doesn't mean you have a good design.
All the time it seemed to me like null(from other languages) or NULL from SQL. But was not sure :). Why not call it Null::Values so it feels familiar (If the semantics are really close). And the singleton nature made it really hard to me :).
ops sorry.. I meant "cascading nature" :)...
Krasimir, that's very tempting. However, I can't rename it to Null::Values, though I agree that it might make it easier to understand at first. NULL is a constant source of confusion in SQL. That's because in NULLs in SQL tend to be used both for "nothing" and "unknown". The word itself implies "nothing" but the semantics are "unknown". It's a very poorly named bit of SQL. I don't want to be trapped in that mistake. Further, I'd like to have the opportunity to take this code further, but I'd feel more constrained if I called it NULL and then said "but it's really not".
I hope that makes sense. You've raised a good enough point that I should mention this in the docs.
'Sorting unknown values should leave their position in the list unchanged'
That looks very dangerous to me. It implies that sorting
[5, unknown, 1, unknown, 7]
would not sort anything.To be safe, it should sort the same the same as SQL NULL: either all at the front, or all at the back. Anything else is close to unusable.
bart, it's interesting that you mention that and I think you're right in terms of utility. We don't sort those values because, logically, we can't know which way those values should be sorted (though the known values in the list would sort). Further, I suspect that sorting to the front or back should be configurable as different people might have different needs. That starts raising the complexity.
I'll give this some thought. I'll probably look at sorting them to the back, but I'm unsure.
The example given by ovid does indeed leave the unknowns in their original positions, but that's more by dumb luck than any other reason. (Something to do with the order in which comparisons are made in Perl's default mergesort algorithm.)
But this is not always the case. A trivial example is
(2, unknown, 1)
which sorts as(1, 2, unknown)
.But you can't overload sort - you don't get any choice how unknown is sorted.
The only way of controlling how unknown gets sorted is by controlling the comparison operations, but Unknown::Values is already committed to return false for all comparisons (that's the whole point of the module).
Ouch. It really does look like I'll have to dig into this. In Unknown::Values::Instance, the first two items in the compare list are the comparison operators. It think those can be broken out and hacked on, but it seems tricky.
I've just had a try to see if it's possible to detect whether a comparison is happening "inside" sort or not.
The best I could do is:
Unfortunately this is only possible to detect whether we're inside a
sort
that uses thesort BLOCK LIST
syntax, not plainsort LIST
. (And it also doesn't detect sorts where the block has been optimized away likesort { $a <=> $b } LIST
.The latest version on github now sorts unknown values to the end. It thought it would be more useful that way with this:
This is a neat little concept, and I'll have to play with it a little to see if I can put it to good use for clearer code. I especially appreciate the name, "unknown", as it pretty much matches the semantics I would expect to be attached to that name. "The variable is defined, but there may or may not be a value - it could be anything or nothing, we just don't know - so it doesn't match or compare to anything, including another variable set to unknown"
The NULL != NULL semantics in SQL make sense to me, except for the name, "null". From a general "meaning of words" standpoint, I think "unknown" would have been a much better name than NULL in SQL. Generally, when I read the word "null" I think of it as referring to something specific, representing the absence of a value. If that were the case, I would think it makes sense for two variables holding "null" to be considered equal.
Thinking about it now, I wonder if that interpretation of "null" would be better named "nil" or even "void", but I'm now going on a tangent...
I wonder if in addition to having "unknown", it would be useful to have a true "null" (or maybe some other name) representing the absence of any value, with the semantics...
Of course, IANALinguist, YMMV. Ovid++ :)
Hercynium, thanks for the kind words.
While adding a null or nil value would be interesting, I'd need the following:
And I'm glad you like the concept. You might find this blog post interesting.
anyone who puts interns and volunteers into the list of empoyees should be fired too! if they're employees they get minimum wage and can't be got rid of on a whim.
I'm not sure I agree that "NULL" is a terrible name for unknown data. After all, as Hercynium writes:
Which is exactly right. We have a database full of data. How did the data get there? Someone entered it. If there is an absence of data, it's because no one entered it. If no one entered it, it's because no one knew what it was, or they haven't gotten around to entering it yet--either way, it's unknown.
Furthermore, this is what "NULL" means (and has always meant) for databases, and it's what most people understand (if they understand at all) when they hear "NULL." There's no use complaining that the origin of the word doesn't suit the present usage; that's true for tons of words in English (and other languages as well).
I think I would personally find the module most useful if it did (or at least could) emulate the NULL of a SQL-standard RDBMS. That way I could do operations in my Perl which I'd know would work the same as similar operations in my DB. I think that would be valuable.
But I think I'm going to give this module a try anyway and see how it drives. :-)
NULL might also mean "Not Applicable" such as the "State" address field for Israel. (Hardly big enough to place the country's name...)
In fact, in my old Ordain Inc. days (a database machine start-up that never took-off) we listed up to SEVEN totally different meanings that were all have to share the poor SQL NULL concept! Forgot most of them.