Finding Unused Subroutines

By Ovid on July 5, 2012 4:02 PM

Posting this here to help me remember this.

I have to do a bit of work cleaning up some old code and I wrote this quick shell script to find possibly unused subroutines.

#!/bin/bash

tempfile=/tmp/$$allsubs.txt

ack '^\s*sub\s+(\w+)\b' lib |     \
    awk '/sub (\w*)/ { print $2 }' | \
    cut -d'(' -f1 |                  \
    sort |                           \
    uniq > $tempfile

for sub in $(cat $tempfile); do
    if [[ $(expr `git grep -E "\<$sub\>" |wc -l`) == 1 ]]; then
        echo $sub
    fi
done

rm $tempfile

My bash skills are awful (see above), but I've already found 25 subroutines that can probably be deleted. I say "probably" because all have to be investigated.

4 comments

Tagged as:

refactoring

4 Comments

Aristotle | July 6, 2012 4:44 PM

Use ack’s --output switch, it lets you avoid the entire need for shell string munging.
If you have a sort | uniq with no uniq switches you can just use sort -u.
Control flow statements are not special, you can use them in pipes. (They will be executed in a subshell. Caveat coder: that means you cannot smuggle values out of the loop inside env variables.)
If you want to count the number of grep hits, don’t pipe to wc, use the -c switch. And if you just want a binary check (not the case here) then especially don’t use wc, use the -q switch instead and use the exit code directly, à la if grep -q foo bar.txt ; then ... ; fi.
It doesn’t matter much, but what you really want is a word-based fixed-string search, not an extended-regex match (because your input string isn’t) with added word boundary anchors. That’s grep -w -F, supported identically by git grep.
Lastly, grep can accept multiple patterns at once using the -f switch. I wanted to use this to obviate the loop entirely but found it useless in git grep: a minor problem is that -f - does not read from stdin (that can be gotten around using /dev/stdin or command substitution (-f <( ack ... | sort -u ))), but more significantly it hangs in a loop at 100% CPU.

All in all:

#!/bin/bash
ack -h --output='$1' '^\s*sub\s+(\w+)\b' lib \
| sort -u \
| while read subname ; do
    export subname
    git grep -c -w -F "$subname" \
    | awk -F: '{t+=$2} END{if(t<2) print ENVIRON["subname"]}'
done

Ovid replied to comment from Aristotle | July 6, 2012 5:10 PM

Aristotle: once again, I'm humbled. I will clearly never be hired as a bash hacker.

Aristotle | July 6, 2012 5:27 PM

While writing the comment I noticed that more importantly than possible false positives, the script also plenty of false negatives. The word-based search does little to help, there are a lot of things like a method called status in our codebase that would not show up as dead code by this script because the word shows up in the source in many contexts. If you have multiple methods of the same name then all of them could be dead code and you would still get a false negative, and if only one of them is live code you would even get a legitimate-looking false negative.

So this most useful in very “wide” codebases with a fair amount of churn and corresponding amount of legacy cruft.

Aristotle replied to comment from Ovid | July 6, 2012 6:05 PM

Aristotle: once again, I’m humbled. I will clearly never be hired as a bash hacker.

Hehe, Well I won’t either since no one is looking for one of those, it is a pretty useless skill really. It never seems that difficult to me though, to my mind it’s just looking up switches in manpages.

And a certain attitude, OK, granted, to know what kind of switch I am looking for. Basically it’s stringing commands together by pipes and exit codes (control flow is also just exit codes). And you try to get the data in as close to a useful form as possible, as soon as possible. Which is to say – you look for switches that modify output format (and things like grep -q which switch the output to an exit code – again, pipes and exit codes).

If you have to do almost any non-trivial string munging or have to deal with any conditionals based on string parsing, you should probably be using Perl. Maybe awk, for very small uses, like in this case. (It’s Perl light basically. It’s almost not worth learning since the uses where it bests Perl are so few. But it is also when the problem is very small that awk beats Perl, so I still use it occasionally. Sed is even worse on this count.)

It’s a pity that grep (and git grep) does not have a switch to say “give me the total count of matches over all input files”, it would make this script really simple.

Of course that was part of the raison d’être for Perl in the first place, that if your toolbox does not have the exact tool you need it suddenly takes enormously more monkey code to get the shell to do what you want, compared to an almost identical other task for which the toolbox does offer an exactly matching tool. So in the hands of someone who has the right frame of mind and a little persistence, big scripts can disappear… except when they sometimes don’t. It can be maddening. With Perl the relationship between the variations of what you want and how much code they take to write is much more linear.

I’ve pushed shell and friends quite far and come away with the experience that it isn’t worth it to do complicated things with them, though they’re worth learning so you know what’s easy and what’s not. You only want to use them when you can comb along their grain. If you have to go against it more than minutely and briefly, you want a real language. It always seems like extra work in the beginning because real languages are so bad at running programs and you have to write more code to bead together libraries to take over a command’s job, but it pays off very soon.

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid