Moose and Fatal Type Constraints

By Ovid on June 13, 2011 2:40 PM

After some excellent feedback from the Moose list, I thought I could share with you an issue that I know others have been concerned about. What follows are my thoughts on the matter, but I'd very much like to hear advice from others.

Consider the following:

package Foo;
use Moose;

has 'some_value' => (
  is  => 'ro',
  isa => 'Int',
);

When somebody does this:

# value contains the string "No charge"
my $foo = Foo->new({ some_value => $value });

Do you want a 500 error on your Web site or a page which only displays partial information?

There is not a single person reading this entry now how can honestly say what the correct answer is. If you think you know which of the two choices above is correct, you need more experience.

The problem here is that you can't know the answer to that without understanding the underlying problem you're trying to solve and that might involve understanding a heck of a lot about your business. So I have a dilemma.

This example is strictly hypothetical as the code in question is core business code which I cannot share, but the underlying problem is real and serious. In fact, the business problem I describe here is actually an amalgam of business practice of a few of the larger companies I've worked for.

Imagine that you sell hotel rooms and you have three types: "luxury", "standard" and "budget". You may have a "room_type" attribute used in a few places, but you want its data type in one place:

package My::Company::TypeConstraints;
use Any::Moose;
use Any::Moose '::Util::TypeConstraints';

BEGIN {
    enum 'RoomType' => qw(
        luxury
        standard
        budget
    );
}

Now you can just declare that an attribute isa "RoomType" and when an outside supplier sends you XML describing available rooms and they have a room type of "executive", it blows up and they now can no longer tell you which rooms you can sell. Not only does this cost you money, but it costs them money too. It could easily hurt your reputation.

Now imagine that you have many, many of these attributes coming from a wide variety of sources and in a wide variety of formats taking a wide variety of paths through your code.

They are sending data via SOAP, RPC, JSON, email, telnet, call centers, etc. You've trained all of your internal staff and external suppliers on what the myriad data types are, but you have staff turnover, your code has bugs, their code has bugs, they in turn may have external data suppliers who may not be aware of constraints, and so on.

In short, there are a lot of ways that bad data can enter your system, but I don't have to tell you that. You already know it. If this bad data is causing you to lose €1,000 a minute, failure to notice it for an hour or so can you lose a developer's annual salary.

But what do we do? Let's step back for a minute and think about business needs rather than technical ones (they're often one and the same).

Most (all?) developers will accept that virtually all software has bugs. Larger companies tend to have more software than smaller companies and thus have more bugs. So why don't they collapse in a steaming pile of ones and zeroes?

All bugs are created equal, but some are more equal than others. — Aldous Huxley, creator of the Suidae programming language

It's probably pretty bad if you automatically ship a new car for only €53, but it's probably not so bad if you show the wrong sized thumbnail of said car.

So in the room_type example above, what could we do? We could have this:

has 'room_type' => (
    is  => 'ro',
    isa => 'RoomType',
);

Now if an outside supplier sends the "executive" room type, we get a fatal error. That might be exactly the behaviour you want, but you don't know that if you don't know your business rules. It might mean that you've just lost a lot of money. So you could do this:

has 'room_type' => (
    is  => 'ro',
    isa => 'Str',
);

But now we're accepting anything at all and our code might break in other, delightful ways. Further, we won't really know who is sending bad data and who is not. One way of handling this is the following.

package My::Company::TypeConstraints;
use Any::Moose;
use Any::Moose '::Util::TypeConstraints';

BEGIN {
    my %is_valid_room_type = map { $_ => 1 } qw(
        luxury
        standard
        budget
    );
    if ( IS_NOT_PRODUCTION_SERVER ) {
        enum 'RoomType' => keys %is_valid_room_type;
    }
    else {
        coerce 'RoomType',
          from 'Str',
          via => {
              return $_ if $is_valid_room_type{$_};
              LOG "'$_' is not a valid room type";
              return 'standard';
          };
     }
}

I'm sure there are better ways to write that and I'd love built-in support for this. However, not only am I not sure of the best way to genericize this, I am not sure that this is the best approach. However, what do we get?

For non-production uses (e.g., test suites), we can get fatal errors for type constraint violations. For production uses, we get sensible defaults and a nice warning in our logs to tell us that there's an issue. For a production system, you get fault-tolerance and logging of said faults. In other words, you bend rather than break.

This seems like a win-win scenario. You don't throw away type constraints entirely. You might just find out that it's your data consumer which is in error and not the data source. You can have a more gradual transition to Moose/Mouse from your hand-rolled code and you have a great strategy for handling uncertainty about exactly what constraints should be on a particular attribute.

For smaller systems, I'd probably not feel comfortable with this approach. For larger systems, there's a time when you need to balance the desire for "correct" code and "paying your mortgage".

I finish with quote from another brilliant programmer.

In theory, theory and practice are the same. In practice, we're trying to comb spaghetti. — Abraham Lincoln, 1912 speech to the Republican Women's De-emancipation Convention

13 comments

Tagged as:

moose, mouse, typing

13 Comments

http://www.wgz.org/chromatic/ | June 13, 2011 5:39 PM

My controllers are thin. My models are fat.

I like the pattern where my controller delegates everything but parameter marshalling to calls on the model. I wrap those in Try::Tiny blocks, capture any errors into an appropriate data structure, and let my configuration system manage what happens to those errors. So far that lets me keep my model error checking as strict as I want it while allowing for parametric error reporting at the point where I already configure things differently between development and deployment.

rjbs.manxome.org | June 13, 2011 5:42 PM

My first reaction would be to write a parameterized Try type.

has value => ( isa => Try[ Int ], ..., );

Try[] would coerce everything, where coerce means "warn/die-based-on-env if it isn't of the parameter type and always accept." You'd have to do a little jiggering to make the type always coerce, but still accept the unmodified result.

You could also write a trait to apply to your classes to automatically Try-wrap types on each attribute, so that you wouldn't need to change the "isa" values.

Ovid replied to comment from http://www.wgz.org/chromatic/ | June 13, 2011 5:43 PM

@chromatic: that's a very sensible strategy, but when addressing legacy code, one may not have that luxury.

brian d foy | June 13, 2011 5:50 PM

Isn't this the same argument that motivated Perl and other high-level languages in the first place? The more you constrain a system, the less flexibility you have, and when you try to reduce the real world into a few types, the complexity peeks out somewhere else.

Mr. Muskrat | June 13, 2011 6:06 PM

<aside> I often miss working with GDS data. Talk about job security! You have to constantly update those telnet apps to keep up with the ever changing GDS data.* </aside>

At least that was still the case six years ago.

Ovid replied to comment from brian d foy | June 13, 2011 6:58 PM

@brian d foy: that is an exceedingly good point.

http://www.wgz.org/chromatic/ replied to comment from Ovid | June 13, 2011 8:12 PM

Certainly, but I prefer to refactor even my legacy code to this pattern. I know very little about the code you have, but my experience has been that working around the fact that controllers do far too much takes more effort than pushing their behavior to the models instead. So far, that's always been the right choice in the code I've maintained.

grantmclean.myopenid.com | June 13, 2011 10:29 PM

Surely as soon as you write:


  if ( IS_NOT_PRODUCTION_SERVER ) {

You've written code that behaves differently in production vs dev/testing and therefore can't be tested. That would make me nervous.

Another approach would be to always (even in test) do graceful fallback when needed and always log it. Then your test suite can assert that it got the expected log events.

Ben Morrow | June 13, 2011 10:57 PM

To my mind this is an obvious case where you need Perl 6-ish exceptions which are handled before unwinding the stack, and can thus 'un-throw' themselves (that is, resume execution from the point where the exception was thrown, rather than the point where the handler was installed). That way, you can install an exception handler for the type-check exception which can decide, based on any criteria and after any amount of logging and/or fixing up the data, to ignore the error and carry on anyway. You could presumably, given a sufficiently fine-grained exception hierarchy, end up with a very flexible system of handlers for different kinds of type failures, depending on how important they are to your business logic.

Is there a module that provides anything like that for Perl 5? I don't know of one…

Salvador Fandiño | June 14, 2011 10:02 AM

Being less strict validating input in production that in testing is a very bad idea as it would cause your code to take untested paths!

If you want to be permissive with the inputs from your providers, be so also in testing. I would even go an step further and add tests simulating you get non-conforming-but-accepted data.

Richard Huxton | June 14, 2011 10:13 PM

It's probably my database-oriented background, but I separate out the translation altogether.

If my app is supposed to be dealing with a RoomType then that's what I give it. If I try and pass in a bad value then it should explode. Otherwise it gets very difficult to reason about my code. Worse, I can't reason about it without reading all the code - no just skimming declarations/method names.

The import/translation layer does all the fiddly value mapping/defaulting and probably ends up being a huge list of one-off rules. That's OK, because although the code is ugly and fiddly it's doing something that's conceptually very simple. I can also re-run the translation with different rules, gather stats etc. all in a central place.

I don't know how practical this approach is with your legacy code, but the clever change-behaviour-based-on-environment solution seems, well, a bit clever.

Michael J | June 15, 2011 9:30 AM

don't you mean George Orwell there, not Aldous Huxley, or is that some reference I'm missing?

Ovid replied to comment from Michael J | June 17, 2011 8:40 AM

Michael, yeah, that should have been Orwell. Silly me. Nice catch :)

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid