Encapsulating Responsibility in Methods
From time to time I find myself needing to explain what OO programming is. I've written that Objects are Experts, but as usual, there's a deeper lesson to be learned here.
Imagine that you've hired a new barista (espresso bartender) and you're teaching him how to make a latté. You explain
- how to pull the grounds and start the espresso brewing
- how to steam the milk and get a good foam
- pouring the milk into a cup
- pouring the espresso through the milk (and maybe making a nifty pattern on top)
You explain this over and over. That's procedural programming. Finally, one day you can walk in and just ask for latté. You don't have to tell your new barista how to do it, you just ask. Congratulations: your barista is now an object.
The code might look like this:
my $barista = Barista->new;
my $drink = $barista->prepare('latté');
From the outside world, you've encapsulated a lot of knowledge and this snippet is easier to read than listing all of the steps.
my $grounds = pull_grounds($container);
my $espresso = brew($grounds);
my $milk = steam_milk( until => $temp, using => 'hands' );
my $latte = make_latte( $milk, $espresso );
Note that bit about "using => $hands
". Well, how does the barista know what the temperature is? Many experienced baristas will use their hands, but there's a problem with that. If you're not feeling well or you just put lotion on your hands, your temperature sensitivity is often off and you'll tend to not steam the milk to a hot enough temperature. Thus, an expert barista will usually use a thermometer to steam their milk and make sure it's correct. Now you have to update your procedural code:
my $grounds = pull_grounds($container);
my $espresso = brew($grounds);
my $milk = steam_milk( until => $temp, using => 'thermometer' );
my $latte = make_latte( $milk, $espresso );
If you had used objects, you would encapsulate this expert knowledge in your object and the calling code would not change:
my $barista = Barista->new;
my $drink = $barista->prepare('latté');
That's not a big deal in this toy example, but in the real world, this is why objects are so popular: you put your expert knowledge in one place rather than spreading it out in procedural code. In my experience, as procedural code bases get larger, then tend to have less conceptual organization and these bits of expert knowledge get scattered throughout the code and often get out of synch, but with well designed OO systems, it's not only easier to keep the expert knowledge in one place, it's also easier to understand where this expert knowledge should be.
And that brings me to methods (and subroutines, while we're at it). I like to think of them as "little objects". You pass in all the state you need, they do their stuff and return the result. That's why I would write this:
my $milk = steam_milk( until => $temp, using => 'thermometer' );
Instead of this:
my $milk;
do {
$milk = steam_milk();
} until ( temperature($milk) >= $limit );
That brings me to a tiny refactoring I did today. I'm writing a new data import system for $client and I had the following code:
sub run {
my $self = shift;
while ( my $row = $self->_next ) {
$self->_process($row);
}
}
sub _process {
my ( $self, $row ) = @_;
$self->_extract_components($row);
# a list of methods that rely on this data
}
The _next()
method was delegated to my parser:
has '_parser' => (
is => 'ro',
isa => Parser,
lazy => 1,
builder => '_build_parser',
handles => {
num_lines => 'num_lines',
_next => 'next',
},
);
What's important is that the _extract_components()
method extracts relevant data from the $row
and we must always do this when we fetch the next row and set up the importer for processing the data. Some of the methods in _process()
could be called from other places, meaning we had a problem: you must never, ever forget to call _extract_components()
after _next()
or else you would be processing leftover data from a previous line. Oops.
To fix that, you need to encapsulate the responsibility of setting up those components into your _next()
method. My code now looks like this:
sub run {
my $self = shift;
while ( $self->_next ) {
$self->_process_row;
}
}
sub _next {
my $self = shift;
my $row = $self->_parser->next or return;
$self->_extract_compents($row);
return $self;
}
sub _process_row {
my ( $self ) = @_;
# _extract_components no longer needs to be called here
By no longer delegating _next()
directly to the parser, but by calling it explicitly internally and having the _next()
method call $parser->next
itself, it can extract the components for us and ensure that we will never forget to extract those components.
So objects should encapsulate everything they're responsible for, but so should methods. When you see tightly coupled behaviors that must fire correctly but aren't grouped together in a method, look to see if you can simplify your code by grouping them into a method responsible for for those behaviors (this won't always be appropriate, of course).
When you follow this pattern, you may start to notice a pleasant side-effect. Compare this:
sub run {
my $self = shift;
while ( my $row = $self->_next ) {
$self->_process($row);
}
}
To this:
sub run {
my $self = shift;
while ( $self->_next ) {
$self->_process_row;
}
}
Notice how we're not passing around arguments any more? You've heard all the time that you should minimize the number of arguments you pass to subroutines and methods to subroutines. Argument lists in systems can get unwieldy if you're not careful and since my methods must all rely on data that's been properly extracted, they can ask the object for the data rather than rely on arguments that come from who knows where. This is the same reason we do this:
$person->name($name);
Instead of:
$person->{name} = $name;
Your code will be cleaner and your data more reliable.
Except for exceptions. If you have to change one, you must scan all the code to find were they are caught, to make sure the version will work.
Shawn,
Can you give an example of what you mean? If you use documented exception objects, particularly with well-thought out inheritance (hmm, exception roles?), then your code can test the exception class and not worry about the text of the exception. If you designed your exceptions well, you can add a new exception as a subclass of a documented exception that you throw and the calling code should "just work", though if it upgrades to handle the new exception, it may be able to recover more gracefully.
Suppose you have a object A that calls a method in B that calls C that throw an exception X that is caught by A. Now, suppose you change C in B to D, which throws exception Y. You now have to hunt thru all the code base to determine if all caught X are from B or somewhere else and change them to handle Y. And you can't just replace them since some other code than B might call C.
The current way exceptions are treated is not encapsulation friendly.
Hi Shawn,
That will work perfectly well so long as exception Y is an exception object that inherits from exception X. Any code that tests $exception->isa('X') will then respond correctly. This is why exceptions thrown need to be part of the published documentation.
For example, if Exception "X" is an Exception::IO and exception "Y" is Exception::IO::FileNotFound, the calling code will still behave correctly. The key points to remember are that exceptions thrown must be part of the published interface and subclassed exceptions must obey the Liskov Substitution Principle. This is why throwing a proper exception object is preferable to die or croak.
In a fun way, I think hidden in your text was an explanation why it is harder to find good Perl programmers than PHP programmers.
as procedural code bases get larger, ... but with well designed OO systems,
So the procedural code grew organically while the OO system was well designed.
I don't disagree with you, I am just pointing out something that I think we look over. A OO system that grew organically (aka. no design), can be as bad as any procedural code.
@Ovid: Why should Y be derived from X? That's a big assumption.
@Gabor: Big systems that work started as small systems that work. Or, "Whatever happened to FORTRAN III?"
Interestingly I see the original problem in a completely different way. I would refactor out the fact that _extract_components is mutating the current object... i.e.
...and then call
I see it as an anti-pattern to be mutating an object, simply in order to pass some temporary data between methods. This is exactly what method arguments are for. You're calling one method to essentially set up the input for another, but hiding it away with a level of indirection.
Imagine if these were public and you wanted to call your processing code by itself (most obviously for tests).
That means I know a more about the internal structure than I need to. Also, I can't check inside process_row that a row has been given! I could just as easily end up processing old data. By passing an argument to the method I can check that it has been provided. And if it has been, we can assume the calling code has given us something suitably 'fresh'.
I find your argument about the "using" parameter of the steam_milk function not at all convincing.
There's no reason what so ever that the steam_milk must have a "using" parameter, and the barista should not. If you can "hide" that knowledge inside a barista object, you can "hide" the very same logic inside your steam_milk function.
In fact, one can argue that steam_milk solution is to be preferred, as that allows to either use hands or a thermometer, while the method is hardcoded inside the barista object, meaning the latter gives less flexibility to the user.
Of course, the implementation of the barista object will call a pull_grounds method, a brew method, a steam_milk and a make_latte method. Sure, you have encapsulated calling the different methods, but encapsulation isn't something that only objects give you. I'm sure you're capable of writing a procedurial "prepare" function, that takes a 'latte' argument, and calls pull_grounds, brew, steam_milk and make_latte, "encapsulating" that knowledge from the user, and making it an "expert" function. No objects needed.
Y doesn't have to be derived from X, but if you add a new, unrelated exception that has not previously been documented, that's the same as changing the interface to the class. There's a good rule of thumb for OO programming that classes should be open for extension, but not modification. Adding a brand new exception violates that rule of thumb.
Key takeaway: document your exceptions and treat them as part of the contract for your class.
Neil, I know what you mean and I often agree. In this case I elected a different style because the code processes the entire file at once and the mutable state is never exposed.
Because of the complicated business rules, in the older version of this code, passing around these arguments often led to tramp data being passed through various methods. To avoid that, I allowed the object to maintain a guaranteed consistent internal state rather than try to pass that data to many places which didn't need it simply so they could pass it on to methods which did need it. It makes the code and argument passing simpler.
I must say that's one of the less convincing explanations of OOP I know. As Abigail has pointed out, the difference between OO and procedural style is not abstraction vs. lack thereof. One could write a purely procedural make_latte() function without any loss of abstraction because the barista neither has any state nor does it inherit anything, so it being an object and every method passing an extra function parameter is just useless overhead. While it may be justified to write the code like this for consistency of style with other parts of the program or because some kind of state might be necessary in the future, the usual "car" or "rectangle" examples are much clearer for someone who doesn't understand what objects are supposed to be good for.
They suck too, stderr.
The best justification for OO I have seen is “a GUI widget library”. You don’t want to deal with each kind of event by switching on the type of widget and putting the logic for dealing with that kind of widget in the switch branch – because then when you add another type of widget to your program, you have to add another branch to every switch statement in your entire codebase. Instead, you want to give a name to what each switch block does (the same name for switch blocks that do the same thing), and then put each of its branches into the corresponding widgets as methods of the same name. Then, instead of writing a switch block, you just invoke the method of that name on the widget, and because the appropriate logic for each widget is carried along with it in its methods, the right thing automatically happens – even after you create a new kind of widget. You don’t have to change any of your method calls to add another type of widget – you just create one, write the logic for it in its methods, and your existing code immediately works with it.
There you go, OO in one paragraph.
(Well, ideally there should be a few illustrating snippets of example code. But they needn’t be extensive, ~40 lines total in 4 panels will do.)
(Especially ideal is if you build that widget library out of roles and delegation, instead of inheritance.)
There are, of course, a number of other principles to understand (encapsulation etc.), but many of them can be justified in terms of this example – even subtler ones like Liskov (if you subclass). If you already know these principles, think about how. You’ll figure them all out yourself, it’s not hard.
So really, there you go: OO in one paragraph.
responsibility , less expose detail, need to think ..