The Hidden Benefit of Data-Driven Programming

Often we hear people talk about making your programming more "data-driven". When you can convert procedural code to a data structure (generally with a small procedural driver), instead of replicating procedural code, you just add another entry to your data structure. This is great with dispatch tables, repetitive chunks of code and state machines. However, there is a hidden benefit of it which will not only make you a better programmer, but it will make later maintenance programmers fail to notice a common flaw that your code lacks. They'll curse you if you have the flaw, but if you don't have it, they'll find that data-driven sections of your code are so easy to work with that they won't even think about it.

So here's some bad code that I wrote to maintain some fixtures in Veure. I post it here so that you can see that I'm sometimes a sloppy programmer too.

package Veure::Role::Fixture::StationArea;

use Moose::Role;
with qw(Veure::Role::Fixture::Station);

has 'station_area_lost_dreams_port' => (
    is      => 'ro',
    isa     => 'Veure::Schema::Result::StationArea',
    lazy    => 1,
    default => sub {
        my $test = shift;
        $test->station_lost_dreams_station->port;
    },
);

has 'station_area_lost_dreams_ruins' => (
    is      => 'ro',
    isa     => 'Veure::Schema::Result::StationArea',
    lazy    => 1,
    default => sub {
        my $test = shift;
        $test->station_lost_dreams_station->area_for_uri_name('ruins');
    },
);

has 'station_area_lost_dreams_brig' => (
    is      => 'ro',
    isa     => 'Veure::Schema::Result::StationArea',
    lazy    => 1,
    default => sub {
        my $test = shift;
        $test->station_lost_dreams_station->area_for_uri_name('brig');
    },
);

has 'station_area_the_house_of_comoros_port' => (
    is      => 'ro',
    isa     => 'Veure::Schema::Result::StationArea',
    lazy    => 1,
    default => sub {
        my $test = shift;
        $test->station_the_house_of_comoros->port;
    },
);

has 'station_area_the_house_of_comoros_ruins' => (
    is      => 'ro',
    isa     => 'Veure::Schema::Result::StationArea',
    lazy    => 1,
    default => sub {
        my $test = shift;
        $test->station_the_house_of_comoros->area_for_uri_name('ruins');
    },
);

# This station is a freebooter station and does not have a sickbay.
# Character killed here are restored from their clones, whereever that may
# last have been.
has 'station_area_john_lockes_legacy_port' => (
    is      => 'ro',
    isa     => 'Veure::Schema::Result::StationArea',
    lazy    => 1,
    default => sub {
        my $test = shift;
        $test->station_john_lockes_legacy->port;
    },
);

1;

This code lazily generates test fixtures. Each star system has one or more space stations and each space station has numerous areas you can visit. These fixtures allow you to fetch particular areas for space stations.

That's a lot of code with two glaring design flaws (in case you're wondering, I didn't use DBIx::Class::EasyFixture because this was written long before that module existed).

The first design flaw is easy: that's a lot of code that's effectively duplicated in structure. It's the data that's different. Thus, a data-driven approach is suggested.

Do you see the second flaw? Let's look at the fixture names:

  • station_area_the_house_of_comoros_port
  • station_area_john_lockes_legacy_port
  • station_area_lost_dreams_brig

No? Once you're familiar with it, you'll notice that the name of the "Lost Dreams Station" isn't fully represented in the fixture name. It's not station_area_lost_dreams_station_brig. I knew of this issue, but ignored it for a while until I got tired of repeating the above code. Here's (sort of) what that code looks like now:

package Veure::Role::Fixture::StationArea;

use Moose::Role;
with qw(Veure::Role::Fixture::Station);

my %station_areas = (
    station_lost_dreams_station  => [ qw/port brig embassy training/ ],
    station_the_house_of_comoros => [ qw/port brig ruins market sickbay/ ],
    station_john_lockes_legacy   => [ qw/port market/ ],
);

while ( my ( $station, $uri_names ) = each %station_areas ) {
    foreach my $uri_name (@$uri_names) {
        my $attribute = $station . "_$uri_name";
        $attribute =~ s/^station/station_area/;
        has $attribute => (
            is      => 'ro',
            isa     => 'Veure::Schema::Result::StationArea',
            lazy    => 1,
            default => sub {
                my $test = shift;
                $test->$station->area_for_uri_name($uri_name);
            },
        );
    }
}

1;

I don't want to special case the name of fixture methods for Lost Dreams Station (the space station everyone starts out at that already has enough special cases for it). Instead, I took the trouble of normalizing the names and fixing them in all of the tests. Now, I just add the name of the station (if it's not already there) and the uri name of the area to the code and it just works.

The consistency in naming is a hallmark of good programmers, but it's a very, very hard one to achieve because until you start refactoring, it's often hard to notice naming inconsistencies. When you switch to a data-driven approach, consistency is almost impossible to avoid.

We're programmers. We love patterns. They're a signpost to good code. Data-driven programs will help you get there.

2 Comments

Yep fun stuff stumbled across that one myself a few post ago

https://blogs.perl.org/users/byterock/2014/03/moose-fine-print.html

I wouldn't call it data driven more on the side of 'abstract data types' I would leave data driven programming up to the awk and sed crowd

I got "data driven programming" religion, if you'll pardon the phrase, recently. Along the lines of what you said, you get a great benefit in testing - just look at the data structure before and the data structure after, and make sure the changes are what you need.

But further, it makes it easier for someone new to understand the code. I'm on an enormously complex Java project at the moment, and it uses the Strategy Design Pattern, Visitor Design Pattern, Decorator Design Pattern, etc... all are considered best practices for working with complex code structures, but if I don't know what's happening at some point in the code I need a hundred print statements or ten minutes clicking around in a debugger just to begin to figure it out.

If instead I had a Map of Lists and nested Maps (or I guess in Perl, a hash of arrays and nested hashes), the functions to manipulate the data get uglier but figuring out the state of the program at any point is a one line print statement. Figuring out the state of the program before and after any one function is run is two print statements.

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/