80% Hacks
I'm still blogging five days a week, but obviously not here. That's largely because my new daughter is forcing me to choose where I spend my time and I can't blog too much about what I do lest I reveal trade secrets. So, just to keep my hand in, here's an ugly little "80% hack" that lets me find bugs like mad in OO code. I should really combine this with my warnings::unused hack and start building up a tool find find issues in legacy code.
First, an "80% Hack" is based on the Pareto Principle which states that 80% of the results stem from 20% of the effort. So I often write what I call 80% hacks which are simply quick and dirty tools which get things done.
The idea is simple. In legacy OO code where we're not using Moose, we have a nasty tendency to reach inside a blessed hashref. However, as classes start getting old and crufty, particularly in legacy code which is earning the company a ton of money, it's easy for someone to either misspell a hash key or refer to keys which are no longer used. What I've done is assume that each of these keys are used once and only once and I also assume they look like this:
$self->{ foo }
$_[0] -> { "bar" } # yeah, we need arbitrary whitespace
shift->{'something'} # and quotes
Yes, this code could be improved tremendously, but 80% hacks are personal hack which I simply don't pour a lot of time and effort into. Besides, they're fun.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie ':all';
use Regexp::Common;
my $module = shift or die "usage: $0 pm_file";
#my $module = '/home/cpoe/git_tree/main/test_slot';
my $key_found = qr/
(?: \$self | \$_\[0\] | shift ) # $self or $_[0] or shift
\s* -> # ->
\s* { # {
\s* ($RE{quoted}|\w*) # $hash_key
\s* } # }
/x;
open my $fh, '<', $module;
my %count_for;
while (<$fh>) {
while (/$key_found/g) {
my $key = $1;
$key =~ s/^["']|['"]$//g; # try and strip the quotes
no warnings 'uninitialized';
$count_for{$key}{count}++;
$count_for{$key}{line} = $.;
}
}
foreach my $key ( sort keys %count_for ) {
next if $count_for{$key}{count} > 1;
print "Possibly unused key '$key' at line $count_for{$key}{line}\n";
}
I run that with a .pm file as an argument and I get a report like:
Possibly unused key '_key1' at line 1338
Possibly unused key '_key2' at line 5325
...
Possibly unused key '_keyX' at line 4031
It's amazing how many bugs I've found with this.
I can't blog as much as I used to, but they make it all worth it.
I ran this on a part of the work codebase, and it didn't find bugs so much as old, dead, now-useless, leftover bits of code. Which should of course also be got rid of, but it's less urgent than outright bugs.
@ilmari: at first glance that's what I was seeing, too. But rather than remove any of those, I dug through source control for every instance and found several cases where they were outright misspelled or where they were used but someone in later work removed some important code, leaving a feature broken. Still trying to sort through much of that now. Be careful about simply removing things you find (plus, my code most certainly misses some cases).