Three-value logic in Perl

By Ovid on February 19, 2013 7:13 PM

Bad news. You've a brand-new CEO and he has a reputation for having a short temper. He knows about his reputation so he's decided to win over the employees by offering all "underpaid" employees a salary increase of $3,000 per year. You've been tasked to write the code. Fortunately, it's fairly straight-forward.

foreach my $employee (@employees) {
    if ( $employee->salary < $threshold ) {
        increase_salary( $employee, 3_000 );
    }
}

Congratulations. You just got fired and have to find a new job. Here's what went wrong and a new way to make sure it doesn't happen again.

[Side note: if you really are looking for a job and want a job in Europe, drop me a private email]

You just gave all of your unpaid interns a $3,000 salary. You just gave all of your volunteers a $3,000 salary. You just gave all of your hourly workers a $3,000 salary on top of their hourly wage.

You didn't get fired for that. You got fired because the brand new CEO didn't have his salary entered into the system yet and when he was notified of a $3,000 salary, his short-temper kicked in and he demanded that the incompetent programmer be fired.

In retrospect, it's obvious what happened: a bunch of $employee objects returned an undefined salary and that evaluated to zero for the numeric comparison. You probably got a bunch of warnings (assuming you enabled them) and you should have written your code like this:

foreach my $employee (@employees) {
    next unless defined $employee->salary;
    if ( $employee->salary < $threshold ) {
        increase_salary( $employee, 3_000 );
    }
}

Of course, you hate calling that salary method twice:

foreach my $employee (@employees) {
    my $salary = $employee->salary;
    next unless defined $salary;
    if ( $salary < $threshold ) {
        increase_salary( $employee, 3_000 );
    }
}

Or, um, here's a curiosity: maybe in this code the $threshold varies and is sometimes undefined. In this example you won't have a bug if $threshold is undefined like you would if $salary is undefined. Isn't that annoying? Sometimes undef will cause a bug and sometimes it won't. It will, however, cause a warning in both spots. So the programmer who didn't get fired rewrote it like this:

if ( defined $threshold ) {
    foreach my $employee (@employees) {
        my $salary = $employee->salary;
        next unless defined $salary;
        if ( $salary < $threshold ) {
            increase_salary( $employee, 3_000 
        }
    }
}

How did your nice, simple foreach loop turn into a monstrosity like that? All of us have seen this sort of cruft before. The original code was a pure expression of the business logic involved, but you've had to wrap it up in a bunch of structural code to work around the limitations of the language. Wouldn't it be nice if you didn't have to do that?

Well, now you can. In the employee package, you use Unknown::Values and you set the default value of salary to unknown:

package Employee;
use Moose;
use Unknown::Values;
# ...
has 'salary' => ( 
    is      => 'rw',
    default => sub { unknown },
);

Basically, anywhere where you would have used an undef value, you now use unknown instead. And how do you change your original foreach loop? You don't:

foreach my $employee (@employees) {
    if ( $employee->salary < $threshold ) {
        increase_salary( $employee, 3_000 );
    }
}

That works unchanged because regardless of what the $threshold is, we can't know if the unknown value is less than it, so we return false. In fact, not only does < return false, so does == and > (and just about any other comparison operator that you can reasonably think of). Furthermore, you don't get warnings for comparing unknown values to other values because that's what unknown is designed to do! Getting the unknown salaries is easy:

use Unknown::Values;
my @unknown = grep { is_unknown( $_->salary ) } @employees;

I've implemented this with Kleene's three-valued logic which is a clean, straight-forward way of handling "true", "false" and "maybe/unknown". In this logic, the negation of unknown is unknown. Logical and is as follows:

true    && unknown is unknown
false   && unknown is false
unknown && unknown is unknown

You can reason your way through this by thinking "true and false is false", but "true and true is true", thus, "true and unknown cannot be known". However, "false and either true or false must be false", so "false and unknown must be false".

Logical or is as follows:

true    || unknown is true
false   || unknown is unknown
unknown || unknown is unknown

Again, you can reason your way through those in a similar way.

Here's another example:

use Unknown::Values;

my $value = unknown;
my @array = ( 1, 2, 3, $value, 4, 5 );
my @less    = grep { $_ < 4 } @array;   # assigns (1,2,3)
my @greater = grep { $_ > 3 } @array;   # assigns (4,5)

As you can see, neither @less nor @greater will ever contain an unknown value. Sorting is handled by not sorting unknown values:

my @sorted = sort { $a <=> $b } ( 4, 1, unknown, 5, unknown, unknown, 7 );
eq_or_diff \@sorted, [ 1, 4, unknown, 5, unknown, unknown, 7 ],
  'Sorting unknown values should leave their position in the list unchanged';

With Unknown::Values, anything involving non-boolean behavior (except printing) will die with a stack trace:

% re.pl
$ use Unknown::Values;
$ my $value = unknown;
[unknown]
$ print $value + 3;
Runtime error: Math cannot be performed on unknown values at lib/Unknown/Values/Instance.pm line 43.
... rest of the stack trace

Stringification is overload to return [unknown], but bit manipulation, string concatenation, dereferencing or other operations which make no sense will immediately result in a fatal error. If you want to avoid that, simply check if you have an unknown value or set a defined default:

my $value = 0;
# later
$value += 3;

Or:

my $value = unknown;
# later
if ( is_unknown $value ) {
    croak("Our value was never set");
}
$value +=3;

This has interesting consequences. The following looks like a bug, but it's not:

my $value = unknown;
$value ||= 2;   # //= also fails
print $value + 3; # fatal error

This is because with an unknown value, you cannot know if it's undefined or false. Thus, the //= and ||= assignments will fail. You might think of this as a limitation, instead. Maybe I'll see if I can work around it later.

Future Work?

Fasten your seat belts because you finished the easy part and the hard part is coming up.

Currently the following always returns false if the two values are unknown:

if ( $value1 eq $value2 ) {
    ... never happens if both are unknown
}

Why isn't one unknown equal to another unknown? Because their values are both unknown and you don't know if they're the same. However, what if you do this?

my $value  = unknown;
my $value2 = $value;
if ( $value2 == $value ) {
    ...
}

Logically, even though both of those values are unknown, they're the same unknown and thus the comparison should succeed.

One possible enhancement is to ensure that an unknown return a sequentially different unknown and thus allow me to say that an unknown is equal to itself but not equal to other unknowns. (Sort of like Rumsfeld's "known unknowns" and "unknown unknowns".) This sounds strange, but it means this would work:

my $value1 = unknown;
my $value2 = $value1;

if ( $value1 == $value2 ) {
    ... always true because it's an instance of a *single* unknown
}

But that gets confusing because we then have this:

if ( $value1 == unknown ) {
    ... always false because unknown generates a new unknown
}

So an unknown sometimes equals unknowns and sometimes doesn't. It only matches an unknown if it's itself. On the surface this actually seems to be correct, except that we then have this:

if ( ( 6 != $value ) == ( 7 != $value ) ) {
    ... always false
}

That has to be false because 6 != $value must return a unknown and 7 != $value should return a different unknown and their cascaded unknown value cannot match. However, the following must be true:

if ( ( 6 != $value ) == ( 6 != $value ) ) {
    ... always true!
}

Because 6 != $value should always return the same unknown. Here's why. We assume, for the sake of argument, that the unknown $value has a value, but we don't know it. Let's say that value is 4. The above reduces to this:

if ( ( 6 != 4 ) == ( 6 != 4 ) ) {

Since 6 != 4 is true, we get this:

if ( 1 == 1 ) {

Ah, but what if $value's hidden value was actually 6? Then we get this:

if ( ( 6 != 6 ) == ( 6 != 6 ) ) {

Since 6 != 6 is false, we get this:

if ( 0 == 0 ) {

In other words, there's a lot of interesting things we could do, but this would likely involve a fair amount of work breaking out the code for each and every operator and ensuring that it's handled correctly. I have an idea of how we could make this work. This would also let us get better error messages, but for now, I'm happy with this first pass and hoping that it will help eliminate a large class of common errors in Perl.

More importantly, if you use unknown instead of undef, a lot of the structural code where you check for a defined value goes away and the resulting code is both clearer and more correct. Hooray for writing code that states the problem of the business rather than the problems of the programming language!

18 comments

Tagged as:

perl

18 Comments

Krasimir Berov | February 19, 2013 10:14 PM

...I fear thought this concept may seem pretty complicated to grasp and as a result - not used.

Ovid replied to comment from Krasimir Berov | February 19, 2013 10:19 PM

Krasimir: perhaps I should point out that this is the same logic as the SQL NULL? Once you realize that, it's pretty simple (except that I don't believe the SQL NULL handles the cascading unknown comparison I have listed).

brian d foy | February 19, 2013 10:30 PM

Well, whoever made the increase_salary() method respond for interns and volunteers should be fired as well. :)

Although this is beside your point about the known values, people don't use object oriented programming well anyway. Using Moose to let you define accessors doesn't mean you have a good design.

Krasimir Berov | February 19, 2013 10:38 PM

All the time it seemed to me like null(from other languages) or NULL from SQL. But was not sure :). Why not call it Null::Values so it feels familiar (If the semantics are really close). And the singleton nature made it really hard to me :).

Krasimir Berov | February 19, 2013 10:41 PM

ops sorry.. I meant "cascading nature" :)...

Ovid replied to comment from Krasimir Berov | February 20, 2013 6:22 AM

Krasimir, that's very tempting. However, I can't rename it to Null::Values, though I agree that it might make it easier to understand at first. NULL is a constant source of confusion in SQL. That's because in NULLs in SQL tend to be used both for "nothing" and "unknown". The word itself implies "nothing" but the semantics are "unknown". It's a very poorly named bit of SQL. I don't want to be trapped in that mistake. Further, I'd like to have the opportunity to take this code further, but I'd feel more constrained if I called it NULL and then said "but it's really not".

I hope that makes sense. You've raised a good enough point that I should mention this in the docs.

bart | February 20, 2013 12:22 PM

'Sorting unknown values should leave their position in the list unchanged'

That looks very dangerous to me. It implies that sorting [5, unknown, 1, unknown, 7] would not sort anything.

To be safe, it should sort the same the same as SQL NULL: either all at the front, or all at the back. Anything else is close to unusable.

Ovid replied to comment from bart | February 20, 2013 12:32 PM

bart, it's interesting that you mention that and I think you're right in terms of utility. We don't sort those values because, logically, we can't know which way those values should be sorted (though the known values in the list would sort). Further, I suspect that sorting to the front or back should be configurable as different people might have different needs. That starts raising the complexity.

I'll give this some thought. I'll probably look at sorting them to the back, but I'm unsure.

Toby Inkster replied to comment from bart | February 20, 2013 12:56 PM

The example given by ovid does indeed leave the unknowns in their original positions, but that's more by dumb luck than any other reason. (Something to do with the order in which comparisons are made in Perl's default mergesort algorithm.)

But this is not always the case. A trivial example is (2, unknown, 1) which sorts as (1, 2, unknown).

Toby Inkster replied to comment from Ovid | February 20, 2013 12:59 PM

But you can't overload sort - you don't get any choice how unknown is sorted.

The only way of controlling how unknown gets sorted is by controlling the comparison operations, but Unknown::Values is already committed to return false for all comparisons (that's the whole point of the module).

Ovid replied to comment from Toby Inkster | February 20, 2013 1:30 PM

Ouch. It really does look like I'll have to dig into this. In Unknown::Values::Instance, the first two items in the compare list are the comparison operators. It think those can be broken out and hacked on, but it seems tricky.

Toby Inkster replied to comment from Ovid | February 20, 2013 1:55 PM

I've just had a try to see if it's possible to detect whether a comparison is happening "inside" sort or not.

The best I could do is:

package Sort::InSort;
use Inline C => q{
    bool in_sort () {
        return !(PL_sortcop == NULL);
    }
};

Unfortunately this is only possible to detect whether we're inside a sort that uses the sort BLOCK LIST syntax, not plain sort LIST. (And it also doesn't detect sorts where the block has been optimized away like sort { $a <=> $b } LIST.

Ovid | February 20, 2013 1:56 PM

The latest version on github now sorts unknown values to the end. It thought it would be more useful that way with this:


    my @things = sort @maybe_unknown_things;
    foreach (@things) {
        last if is_unknown;
        # work with known values
    }

Hercynium | February 20, 2013 2:02 PM

This is a neat little concept, and I'll have to play with it a little to see if I can put it to good use for clearer code. I especially appreciate the name, "unknown", as it pretty much matches the semantics I would expect to be attached to that name. "The variable is defined, but there may or may not be a value - it could be anything or nothing, we just don't know - so it doesn't match or compare to anything, including another variable set to unknown"

The NULL != NULL semantics in SQL make sense to me, except for the name, "null". From a general "meaning of words" standpoint, I think "unknown" would have been a much better name than NULL in SQL. Generally, when I read the word "null" I think of it as referring to something specific, representing the absence of a value. If that were the case, I would think it makes sense for two variables holding "null" to be considered equal.

Thinking about it now, I wonder if that interpretation of "null" would be better named "nil" or even "void", but I'm now going on a tangent...

I wonder if in addition to having "unknown", it would be useful to have a true "null" (or maybe some other name) representing the absence of any value, with the semantics...

Of course, IANALinguist, YMMV. Ovid++ :)

Ovid replied to comment from Hercynium | February 20, 2013 2:11 PM

Hercynium, thanks for the kind words.

While adding a null or nil value would be interesting, I'd need the following:

Some clear use cases
Explicitly defined semantics

And I'm glad you like the concept. You might find this blog post interesting.

David Cantrell | February 20, 2013 9:39 PM

anyone who puts interns and volunteers into the list of empoyees should be fired too! if they're employees they get minimum wage and can't be got rid of on a whim.

Buddy Burden | February 25, 2013 7:33 PM

I'm not sure I agree that "NULL" is a terrible name for unknown data. After all, as Hercynium writes:

Generally, when I read the word "null" I think of it as referring to something specific, representing the absence of a value.

Which is exactly right. We have a database full of data. How did the data get there? Someone entered it. If there is an absence of data, it's because no one entered it. If no one entered it, it's because no one knew what it was, or they haven't gotten around to entering it yet--either way, it's unknown.

Furthermore, this is what "NULL" means (and has always meant) for databases, and it's what most people understand (if they understand at all) when they hear "NULL." There's no use complaining that the origin of the word doesn't suit the present usage; that's true for tons of words in English (and other languages as well).

I think I would personally find the module most useful if it did (or at least could) emulate the NULL of a SQL-standard RDBMS. That way I could do operations in my Perl which I'd know would work the same as similar operations in my DB. I think that would be valuable.

But I think I'm going to give this module a try anyway and see how it drives. :-)

MeirG | March 1, 2013 5:58 PM

NULL might also mean "Not Applicable" such as the "State" address field for Israel. (Hardly big enough to place the country's name...)

In fact, in my old Ordain Inc. days (a database machine start-up that never took-off) we listed up to SEVEN totally different meanings that were all have to share the poor SQL NULL concept! Forgot most of them.

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid