More On Finding Duplicate Code in Perl

While at the Quack and Hack, I wrote Code::CutNPaste. This tries to find duplicate code in Perl and does a fairly decent job, right down to finding code where people have changed the variable or subnames.

At the suggestion of Liz, I add a --jobs switch. For one project with almost 400 .pm files, I originally had this:

time find_duplicate_perl lib > report.txt
real    65m52.922s
user    39m2.998s
sys 24m27.776s

I now have this (a multi-core machine helps):

time find_duplicate_perl --jobs 4 lib > report.txt
real    22m49.700s                                                    
user    41m22.387s
sys 34m36.146s

It finds plenty of duplicated code, too. It also now does a reasonable job of not reporting this as a duplicate (see the (misspelled) --threshhold switch):

            };           |         };
        }                |     }
        return \@result; |     return \@result;
    }                    | }
    sub _confirm {       | sub _confirm {

Plus, one person is already threatening patches. When it's a bit more reasonable, I'll put it on the CPAN for you.

Leave a comment

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/