PPIX::Refactor

(Moved from reddit.com/r/perl now that blogs.perl.org seems to be behaving itself).

Over the past year or so I've had to deal with PPI for parsing and rewriting perl code a handful of times. Refactoring scripts are generally one-shot single purpose. Like any data mangling activity refactoring scripts can get messy because they're generally disposable, so architecture tends to be an afterthought at best. You can kind of tell this if you go and read Perl::Critic policies. They're plagued by boilerplate and repetitive code, and that's for code that is extensively reused. The problem multiplies in the case where you're writing throwaway code against PPI. In this post I'm going to show off PPIx::Refactor, which is a minimal interface to contain a small but annoying part of the mess.

The job I tried this toolkit out on concerned a tied hash. In particular I was pretty sure that the uses of the hash of interest to me where only where it was referred to as $self, and only used in its own namespace and the subclasses' namespaces. I wanted to see every way self was used in the 80k lines of code. I wanted to classify or discard each instance for further usage. I had enormous amounts of fun telling people "I read 80,000 lines of code today" and when they looked at me in a surprised manner I told them "well more accurately I wrote a program to read it for me :)".

I probably won't rewrite my code using PPI, instead I will use the classification to enable manual adjustment of the code. Given that here's the script. It uses my shiny new(ish) module PPIx::Refactor to contain some but not all of the mess.

#!/usr/bin/env perl;
use strictures 1;
use PPIx::Refactor;
use Path::Tiny;
use Safe::Isa;

my $finder = sub {
        my ($elem, $doc) = @_;
        return 1 if $elem eq '$self'; # like I said I'm intersted in $self
        return 0;
};

my %index;
my $itr = path('/path/to/lib/NameSpace')->iterator({recurse => 1});
while (my $path = $itr->()) {
        next unless $path =~ /\.pm$/;
        next if $path =~ /$some_module_im_not_interested_in\.pm$/;
        my $p = PPIx::Refactor->new(file => $path, ppi_find => $finder);
        my $finds = $p->finds;

There's also an optional writer attribute and rewrite method, which both work in a similar way, and operates on the output of finds. But I haven't made much use of this yet. Rewriting with PPI is painful and fragile (because there are always edge cases you haven't thought of). However rewriting with PPi is occasionally useful, even though it's not covered here.

Aaand that's basically it as far as PPIx::Refactor goes. I eliminated a bunch of tedious boilerplate. The rest is pretty standard perl code:

        foreach my $self (@$finds) {
                my %type = classify($self);
                # for closure debugging outside of closure
                my $parent = $self->parent;
                my $prev = $self->sprevious_sibling;
                my $next = $self->snext_sibling;

If you're unfamiliar with PPI, the snext and sprevious methods find the subsequent and precedent symbols that are significant to the code. If you care about things that don't affect the way the code runs, eliminate the s from the method calls.

                $DB::single=1 if ! keys %type;# debugger breakpoint  for me to work with what I've got :)
                $type{unknown} = $self if ! %type;
        }
 }

What I have here is a simple testbed for nailing down what I'm interested in. The key is in the classify routine, which is just a proxy to a dispatch table. If the classifier comes up with nothing I set it to the unknown key

sub classify {
        my ($self) = @_;
        my $parent = $self->parent;
        my $prev = $self->sprevious_sibling;
        my $next = $self->snext_sibling;

Like I said, this kind of code gets pretty ugly fast, and I have some duplication here for convenience. There are better ways of doing this, but for my current purposes this is disposable code and I don't care.

This first one is a super simple rule to find the examples of assigning something to $self:

        my %types;
        my %dispatch = (
                mine => sub {
                        $types{assignment} = $self 
                                if $parent->first_token eq 'my' && $self->snext_sibling eq '=';
                },

This one looks for $self being blessed:

                blessing => sub {
                        $types{blessing} = $self
                                if $parent->first_token eq '$self'
                                        && $next eq ','
                                        &&     $parent->$_isa('PPI::Statement::Expression')
                                        &&     $parent->parent->$_isa('PPI::Structure::List')
                                        &&     $parent->parent->parent->first_token eq 'bless';
                },

And another super-simple rule for return $self:

                ret_self => sub {
                        $types{ret_self} = $self if $parent->first_token     eq 'return' && $parent->last_token eq ';';
                },

The STORE operations are one of the items I'm actually interested in. The FETCH operations not so much because of the way the code is structured they have very few points of entry (thanks for small mercies here!):

                STORE => sub {
                        $types{store} = $self
                                if         ( $next eq '->' || $next eq '='     )
                                        &&     $next->snext_sibling->$_isa('PPI::Structure::Subscript')
                                        &&     $next->snext_sibling->snext_sibling eq '=';
                },
        }
}

Once that's all done I have to mangle the contents of %type into %index. I couldn't get PPI to play nicely with Storable, so this is more kludgey than I would like.

                foreach my $k (keys %type) {
                        # $DB::single=1 if scalar(keys %type) > 1; # breakpoint if we classify twice
                        my $entry = $index{$k} || [];                    # create index key if not defined
                        my $parent = $type{$k}->parent;              # this ought to be the complete statement context for the $self

So the idea is to just have the code spit out a filename and a line number:

                        $entry = {
                                file => $type{$k}->document->filename,
                                line => $type{$k}->line_number,
                                parent => "$parent",
                        };
                        push @{ $index{$k} }, $entry;
                }

Last up I need to spit out the %index into a readable format.

use YAML::XS;
print Dump \%index;

I think for my next trip with PPPIx::Refactor I will be writing something almost but not entirely unlike Perl::Critic::Progressive more suited to largeish commercial codebases written in such a way it doesn't bring the CI rig (you do have one of them don't you?) to its knees.

Leave a comment

About kd

user-pic Australian perl hacker. Lead author of the Definitive Guide to Catalyst. Dabbles in javascript, social science and statistical analysis. Seems to have been sucked into the world of cloud and devops.