November 2015 Archives

Exact Perl location with B::DeparseTree (and Devel::Callsite)

Recently I have been working on this cool idea: using B::Deparse to help me figure out exactly where a program is stopped. This can be used in a backtrace such as when a program crashes from Carp::Confess or in a debugger like Devel::Trepan.

To motivate the idea a little bit, suppose my program has either of these lines:

$x = $a/$b + $c/$d;
($y, $z) = ($e/$f, $g/$h);

I might want to know which division in the line is giving me an illegal division by zero.

Or suppose you see are stopped in a Perl statement like this:

my @x = grep {$_ =~ /^M/} @list;

where exactly are you stopped? And would the places you are stopped at be different if this were written:

my @x = grep /^M/, @list;

? (The answer is yes.)

A while back with the help of perlmonks, the idea of using the OP address was the only promising avenue. More recently, I re-discovered B::Deparse and realized it might be able to do the rest: give the context around a specific op-code location. Devel::Callsite can be used to get your current op-code address.

B::Deparse is one of those things like the venerable perl debugger:

  • It is a brute-force effort with a long history,
  • many people have contributed to it,
  • it is one huge file.

It has been said that nothing can parse Perl other than Perl. Well, nothing can de-parse Perl's OP's other than B::Deparse. It understands the Perl interpreter and its intricacies very well.

But the most important feature I need is that B::Deparse has a way of doing its magic inside a running program. You can give it a subroutine reference at runtime and it will deparse that.

A useful side benefit in B::Deparse's output is that it will split up multi-statement lines into one line per statement.

So:

  $x = 1; $y *= 2; $z = $x + $y;

will appear as:

  $x = 1;
  $y *= 2;
  $z = $x + $y;

All good so far. The first piece of bad news is that it doesn't show the OP addresses. But that is pretty easily remedied.

Initially I figured I'd handle this the way I did when I wanted to show fragments of disassembly code colorized using B::Concise: I'd just dump everything to a buffer internally and then run some sort of text filtering process to get the part I wanted.

So I monkey-patched and extended B::Deparse so I could search for an op address and it would return the closest COP, and I show that statement. This was released in version 0.70 of Devel::Trepan.

This is a hack though. It isn't really what I wanted. While showing just the addresses at COP or statement boundaries helps out with multiple statements per line, it isn't all that helpful otherwise. In the first example with dividing by zero or an inside a parallel assignment, there would just be to COP addresses and that's really no better than giving a line number. I need to add information about sub parts inside a statement.

So the next idea was to extend B::Deparse to store a hash of addresses (a number) to B:OPs. Better. But not good enough. I still would need to do the part that B::Deparse does best: deparsing.

Also, I want to have a way to easily go up the OP tree to get larger and larger context. For example, suppose the code is:

 $x = shift; $y = shift;

and I report you are stopped at "shift". I would probably want to say: Give me the full statement that the "shift" is part of. This means in the OP tree I would want the parent. Although there is a way to compile Perl storing parent pointers, Perl generally isn't built that way. Given an OP address, I'm not sure how we could easily find its parent other than starting from the top and traversing.

So my current tack is sort of an abstract OP tree which stores text fragments for that node in the tree. As it walks the tree top down it saves parent pointers to the nodes it creates.

You may ask, what's the difference between this and the OP tree other than the parent pointer?

Well, recall that B::Deparse has already abstracted the OP codes, from a lower level form into higher level constructs. This is true more so as we move up the first couple levels of the tree. The Perl output is generic and dumb, but still it is slightly at at higher level than the sequence of OP instructions.

Saving more of the tree structure can improve deparsing itself.

Right now B::Deparse walks the tree and builds Perl code expressions and statements bottom up. The main thing passed down right now is operator precedence to reduce the extraneous parentheses. At level in the OP tree, the only information from the children passed up is the result string.

In my B::DeparseTree, in addition to the text fragments, I keep child information in a more structured way, and a parent pointer is saved and is available during processing. The parent pointer is useful in showing larger context described below. Each node notes whether parenthesis were needed when combined at the next level, so that they can be omitted when displaying the fragment starting at that node.

I close with some observations in using this. My first test was with fibonacci:

sub fib($) {
   my $x = shift;
   return 1 if $x <= 1;
   return fib($x-1) + fib($x-2);
  }

If you deparse stopped in a debugger in the line with my $x = shift, you get:

 shift()  # which is inside..
 my $x = shift()

So far so good. Stepping to the next stopping point inside the line with return 1 if $x <= 1 you get:

 $x # which is inside...
 $x <= 1

Still good. Things start get interesting when I do another step into return fib($x-1) + fib($x-2); Deparsing, as I originally had it, did not find anything. Here's why:

-- main::(example/fib.pl:11 @0x221dce8)
return(fib($x-1) + fib($x-2))
(trepanpl): deparse
# Nothing
(trepanpl): disasm -terse
Subroutine main::fib
-------------------
 main::fib:
UNOP (0x221dc40) leavesub [1]
    LISTOP (0x21f9608) lineseq
#9:     my $x = shift;
        COP (0x21f9650) dbstate
        BINOP (0x21f96b0) sassign
            OP (0x21f96f8) shift
            OP (0x21f9730) padsv [1]
#10:     return 1 if $x <= 1;
        COP (0x2227e98) dbstate
        UNOP (0x2227ef8) null
            LOGOP (0x2227f38) and
                BINOP (0x2227f80) le
                    OP (0x2228008) padsv [1]
                    SVOP (0x2227fc8) const  IV (0x4d25160) 1
                LISTOP (0x2228040) return
                    OP (0x21f9590) pushmark
                    SVOP (0x21f95c8) const  IV (0x4d25238) 1
 #11:     return(fib($x-1) + fib($x-2))
        COP (0x221dc88) dbstate
        LISTOP (0x221dd20) return
 =>              OP (0x221dce8) pushmark
            BINOP (0x221dd68) add [6]
                UNOP (0x221dfb8) entersub [3]
                    UNOP (0x2227d00) null [149]
                        OP (0x2227cb0) pushmark
                        BINOP (0x2227d48) subtract [2]
                            OP (0x2227e10) padsv [1]
                            SVOP (0x2227d90) const  IV (0x4d24f38) 1
                        UNOP (0x2227dd0) null [17]
                            SVOP (0x2227e50) gv  GV (0x4d03b28) *fib
                UNOP (0x221ddb0) entersub [5]
                    UNOP (0x221de28) null [149]
                        OP (0x221ddf0) pushmark
                        BINOP (0x221de70) subtract [4]
                            OP (0x221df38) padsv [1]
                            SVOP (0x221deb8) const  IV (0x4d24e30) 2
                        UNOP (0x221def8) null [17]
                            SVOP (0x221df78) gv  GV (0x4d03b28) *fib

The next instruction to be executed is a pushmark, and B::Deparse skips that when it procesess the LISTOP. My remedy here was to note in the structure other ops underneath that are "skipped" or subsumed in the parent operation.

After fixing this the output is:

return (fib($x - 1) + fib($x - 2)) # part of...
sub fib($) {
   # line 9 'example/fib.pl'
   # ... rest of fib code

Stepping recursively into fib you get the last weirdness I encountered. Here is Devel::Trepan output so I can describe the situation better:

trepan.pl example/fib.pl
-- main::(example/fib.pl:14 @0x21798a8)
printf "fib(2)= %d, fib(3) = %d, fib(4) = %d\n", fib(2), fib(3), fib(4);
set auto eval is on.
(trepanpl): b 9   # first statement in fib
Breakpoint 1 set in example/fib.pl at line 9
(trepanpl): continue
xx main::(example/fib.pl:9 @0x217d268)
  my $x = shift;
(trepanpl): continue  # first recursive call
xx main::(example/fib.pl:9 @0x217d268)
   my $x = shift;
(trepanpl): up
--> #1 0x221ddf0 $ = main::fib(2) in file `example/fib.pl' at line 11
 main::(example/fib.pl:11 @0x221ddf0)
 return(fib($x-1) + fib($x-2))
(trepanpl): deparse
fib($x - 2) # part of...
fib($x - 1) + fib($x - 2)
(trepanpl):

I'm in fib($x-2)? No, I'm in the middle of evaluating fib($x-1)! What's going on?

The stopping location is really the point where I would continue. (It is the "pushmark" at address 0x221ddf0 in the listing above; this is just before subtacting 2.) So fib($x-2) what I would next execute after returning. To reinforce this, when I step an invocation from fib($x-2) and do the same thing, I now see:

fib($x - 1) + fib($x - 2) # part of
return (fib($x - 1) + fib($x - 2))

Which is saying I am stopped before the final addition, just before the final return. A possible fix is to step back OPs to find the call. I dunno. What do you all think?

In sum, this is all pretty powerful stuff. It's also a lot of work.

About rockyb

user-pic I blog about Perl.