A Desired SQL Affordance
When talking about an affordances, it turns out that there are different definitions, but I like to think of an affordance as "an element of design which suggests the correct course of action", which seems to be pretty much what Donald Norman meant when he adopted the term. It's an elegant concept which, if applied properly, makes it easy to do the right thing and harder to do the wrong thing. However, sometimes you want to do the wrong thing, so an affordance which makes this impossible isn't the most useful in my book. I just want doing the wrong thing to be jarring enough that it would give anyone pause.
And this brings me to the following issue I often see in SQL:
SELECT count(*) FROM employee WHERE salary < 20000;
If salary is allowed to have a NULL value, then employees with a NULL salary won't be included in this count. If you're familiar with SQL, you will instantly know why. However, even those familiar with SQL sometimes fall into a trap here because they think "hey, why didn't I count the unpaid volunteers?"
There's a very simple way of removing that trap, but I seriously doubt anyone will ever implement it.
First of all, what is a NULL value? CJ Date argues that every attribute in a database must have a name and a type. Since a NULL by definition cannot have a type, he argues that it doesn't belong in a relational database. However, despite well-known problems with NULL values, they remain because sometimes you really do need them (primarily to improve performance by limiting the number of joins). So if we're going to have NULL values but we don't know the type, what does this imply?
Consider the following:
my $temperature = $degrees_fahrenheit + $degrees_celsius;
That's obviously non-sensical, even though it will likely work. This is because even though the $degrees_X variables are likely floats, their types are actually different, even if the underlying language doesn't support this. If those were actually objects, you could presumably overload the '+' operator to understand how to add these two types (and figure out which you're going to return), but in the absence of defining operators between these types, you can't know how to manipulate them. That's why, in SQL, "salary < 20000" will never select a NULL salary. Since you don't know what the type of NULL is, you can't define operators to work on it. That's also why we have "salary IS NULL" instead of "salary = NULL", a source of constant confusion to those new to databases.
So what's the affordance I mentioned earlier? If we replaced the word NULL with UNKNOWN, I think we would quickly understand why the above SQL is not equivalent to this:
SELECT count(*) FROM employee WHERE salary < 10000 OR salary IS UNKNOWN;
Once people got used to seeing UNKNOWN instead of NULL, it would be obvious why salary < 10000 could never match a UNKNOWN value and I think a lot of confusion would be avoided.
Yet I can imagine hearing someone say "but wait, what if that's NULL because they're a volunteer and really don't have a salary?"
This is where the beauty of the affordance comes in. A NULL value in the salary field is probably going to be a violation of the 4th normal form, and if you put a "reason" field in the same table (as some people do to describe NULLs), then you're violating 3rd normal form as well! Thus, we have an affordance which helps you to normalize your database but it's still possible to violate this affordance, if needed, which I think is a desirable property.
The affordance will encourage us to make it right. We can make it fast later.