Exact Perl location with B::DeparseTree (and Devel::Callsite)

Recently I have been working on this cool idea: using B::Deparse to help me figure out exactly where a program is stopped. This can be used in a backtrace such as when a program crashes from Carp::Confess or in a debugger like Devel::Trepan.

To motivate the idea a little bit, suppose my program has either of these lines:

$x = $a/$b + $c/$d;
($y, $z) = ($e/$f, $g/$h);

I might want to know which division in the line is giving me an illegal division by zero. A while back with the help of perlmonks, the idea of using the OP address was the only promising avenue. More recently, I re-discovered B::Deparse and realized it might be able to do the rest: give the context around a specific op-code location. Devel::Callsite can be used to get your current op-code address.

B::Deparse is one of those things like the venerable perl debugger:

  • It is a brute-force effort with a long history,
  • many people have contributed to it,
  • it is one huge file.

It has been said that nothing can parse Perl other than Perl. Well, nothing can de-parse Perl's OP's other than B::Deparse. It understands the Perl interpreter and its intricacies very well.

But the most important feature I need is that B::Deparse has a way of doing its magic inside a running program. You can give it a subroutine reference at runtime and it will deparse that.

A useful side benefit in B::Deparse's output is that it will split up multi-statement lines into one line per statement.


  $x = 1; $y *= 2; $z = $x + $y;

will appear as:

  $x = 1;
  $y *= 2;
  $z = $x + $y;

All good so far. The first piece of bad news is that it doesn't show the OP addresses. But that is pretty easily remedied.

Initially I figured I'd handle this the way I did when I wanted to show fragments of disassembly code colorized using B::Concise: I'd just dump everything to a buffer internally and then run some sort of text filtering process to get the part I wanted.

So I monkey-patched and extended B::Deparse so I could search for an op address and it would return the closest COP, and I show that statement. This was released in version 0.70 of Devel::Trepan.

This is a hack though. It isn't really what I wanted. While showing just the addresses at COP or statement boundaries helps out with multiple statements per line, it isn't all that helpful otherwise. In the first example with dividing by zero or an inside a parallel assignment, there would just be to COP addresses and that's really no better than giving a line number. I need to add information about sub parts inside a statement.

So the next idea was to extend B::Deparse to store a hash of addresses (a number) to B:OPs. Better. But not good enough. I still would need to do the part that B::Deparse does best: deparsing.

Also, I want to have a way to easily go up the OP tree to get larger and larger context. For example, suppose the code is:

 $x = shift; $y = shift;

and I report you are stopped at "shift". I would probably want to say: Give me the full statement that the "shift" is part of. This means in the OP tree I would want the parent. Although there is a way to compile Perl storing parent pointers, Perl generally isn't built that way. Given an OP address, I'm not sure how we could easily find its parent other than starting from the top and traversing.

So my current tack is sort of an abstract OP tree which stores text fragments for that node in the tree. As it walks the tree top down it saves parent pointers to the nodes it creates.

You may ask, what's the difference between this and the OP tree other than the parent pointer?

Well, recall that B::Deparse has already abstracted the OP codes, from a lower level form into higher level constructs. This is true more so as we move up the first couple levels of the tree. The Perl output is generic and dumb, but still it is slightly at at higher level than the sequence of OP instructions.

Saving more of the tree structure can improve deparsing itself.

Right now B::Deparse walks the tree and builds Perl code expressions and statements bottom up. The main thing passed down right now is operator precedence to reduce the extraneous parentheses. At level in the OP tree, the only information from the children passed up is the result string.

In my B::DeparseTree, in addition to the text fragments, I keep child information in a more structured way, and a parent pointer is saved and is available during processing. The parent pointer is useful in showing larger context described below. Each node notes whether parenthesis were needed when combined at the next level, so that they can be omitted when displaying the fragment starting at that node.

I close with some observations in using this. My first test was with fibonacci:

sub fib($) {
   my $x = shift;
   return 1 if $x <= 1;
   return fib($x-1) + fib($x-2);

If you deparse stopped in a debugger in the line with my $x = shift, you get:

 shift()  # which is inside..
 my $x = shift()

So far so good. Stepping to the next stopping point inside the line with return 1 if $x <= 1 you get:

 $x # which is inside...
 $x <= 1

Still good. Things start get interesting when I do another step into return fib($x-1) + fib($x-2); Deparsing, as I originally had it, did not find anything. Here's why:

-- main::(example/fib.pl:11 @0x221dce8)
return(fib($x-1) + fib($x-2))
(trepanpl): deparse
# Nothing
(trepanpl): disasm -terse
Subroutine main::fib
UNOP (0x221dc40) leavesub [1]
    LISTOP (0x21f9608) lineseq
#9:     my $x = shift;
        COP (0x21f9650) dbstate
        BINOP (0x21f96b0) sassign
            OP (0x21f96f8) shift
            OP (0x21f9730) padsv [1]
#10:     return 1 if $x <= 1;
        COP (0x2227e98) dbstate
        UNOP (0x2227ef8) null
            LOGOP (0x2227f38) and
                BINOP (0x2227f80) le
                    OP (0x2228008) padsv [1]
                    SVOP (0x2227fc8) const  IV (0x4d25160) 1
                LISTOP (0x2228040) return
                    OP (0x21f9590) pushmark
                    SVOP (0x21f95c8) const  IV (0x4d25238) 1
 #11:     return(fib($x-1) + fib($x-2))
        COP (0x221dc88) dbstate
        LISTOP (0x221dd20) return
 =>              OP (0x221dce8) pushmark
            BINOP (0x221dd68) add [6]
                UNOP (0x221dfb8) entersub [3]
                    UNOP (0x2227d00) null [149]
                        OP (0x2227cb0) pushmark
                        BINOP (0x2227d48) subtract [2]
                            OP (0x2227e10) padsv [1]
                            SVOP (0x2227d90) const  IV (0x4d24f38) 1
                        UNOP (0x2227dd0) null [17]
                            SVOP (0x2227e50) gv  GV (0x4d03b28) *fib
                UNOP (0x221ddb0) entersub [5]
                    UNOP (0x221de28) null [149]
                        OP (0x221ddf0) pushmark
                        BINOP (0x221de70) subtract [4]
                            OP (0x221df38) padsv [1]
                            SVOP (0x221deb8) const  IV (0x4d24e30) 2
                        UNOP (0x221def8) null [17]
                            SVOP (0x221df78) gv  GV (0x4d03b28) *fib

The next instruction to be executed is a pushmark, and B::Deparse skips that when it procesess the LISTOP. My remedy here was to note in the structure other ops underneath that are "skipped" or subsumed in the parent operation.

After fixing this the output is:

return (fib($x - 1) + fib($x - 2)) # part of...
sub fib($) {
   # line 9 'example/fib.pl'
   # ... rest of fib code

Stepping recursively into fib you get the last weirdness I encountered. Here is Devel::Trepan output so I can describe the situation better:

trepan.pl example/fib.pl
-- main::(example/fib.pl:14 @0x21798a8)
printf "fib(2)= %d, fib(3) = %d, fib(4) = %d\n", fib(2), fib(3), fib(4);
set auto eval is on.
(trepanpl): b 9   # first statement in fib
Breakpoint 1 set in example/fib.pl at line 9
(trepanpl): continue
xx main::(example/fib.pl:9 @0x217d268)
  my $x = shift;
(trepanpl): continue  # first recursive call
xx main::(example/fib.pl:9 @0x217d268)
   my $x = shift;
(trepanpl): up
--> #1 0x221ddf0 $ = main::fib(2) in file `example/fib.pl' at line 11
 main::(example/fib.pl:11 @0x221ddf0)
 return(fib($x-1) + fib($x-2))
(trepanpl): deparse
fib($x - 2) # part of...
fib($x - 1) + fib($x - 2)

I'm in fib($x-2)? No, I'm in the middle of evaluating fib($x-1)! What's going on?

The stopping location is really the point where I would continue. (It is the "pushmark" at address 0x221ddf0 in the listing above; this is just before subtacting 2.) So fib($x-2) what I would next execute after returning. To reinforce this, when I step an invocation from fib($x-2) and do the same thing, I now see:

fib($x - 1) + fib($x - 2) # part of
return (fib($x - 1) + fib($x - 2))

Which is saying I am stopped before the final addition, just before the final return. A possible fix is to step back OPs to find the call. I dunno. What do you all think?

In sum, this is all pretty powerful stuff. It's also a lot of work.

A Basic Challenge

Recently, I came across this project which turns C++ into BASIC. How close can Perl do?

I recall reading somewhere that Perl has the ability to vastly alter its syntax. Is Perl going to be bested by C++?

Introspection in Devel::Trepan

Here are some introspection routines in Devel::Trepan. I’m not aware that these exist in other debuggers, nor as Devel::REPL plugins. But if I’m wrong feel free to correct me in comments. And feel free to take code from Devel::Trepan to rework elsewhere.

Recently Jeffrey Ryan Thalhammer asked about variable, and subroutine completion and this got me thinking.

Info functions and Info packages

When he asked, there was a debugger command info functions which listed functions matching some regular expression. (There is also a gdb command of the same name.) That command accepted both fully-qualified and unqualified function names.

It occurred to me that I could also add an adjunct to that command, info packages. This command takes the data used in info functions, but it reindexes the fully-qualified function names keyed by package.

So here are some examples:

 (trepanpl): info packages Tie::   
 Tie::ExtraHash    Tie::Hash    Tie::StdHash

 (trepanpl): info packages -s -f Tie::Hash
   BEGIN    CLEAR    EXISTS    TIEHASH    new 
 Tie::Hash is in file /usr/share/perl/5.14/Tie/Hash.pm

 (trepanpl): info functions Tie::Hash::new
     Tie::Hash::new is at 8-11
 (trepanpl): list Tie::Hash::new  
 /usr/share/perl/5.14.2/Tie/Hash.pm [4-13]
 5      use Carp;
 6      use warnings::register;
 8      sub new {
 9          my $pkg = shift;
10          $pkg->TIEHASH(@_);
11      }
13      # Grandfather "new"

For lexical variables, there is also info variables lexicals.

Completion and Complete

Well, now that that’s there, what about subroutine, package and filename completion on those above commands? That’s in there too. But wait, there’s more! Being something of a geek, I created internal routines for those types of completions. There are also specialized completion routines for my and our variables.

But to give you access to those from the debugger command line, I extended the debugger command complete. This is also a gdb command, although those specific options are not in gdb’s version. That format of giving options, though, is how gdb would do something like this.

The intent in adding that to the debugger command was not just for interactive use, but to provide something that more sophisticated front-ends could use. Or at least show the way.

In fact, the out-of-process client, trepan.pl --client does use this debugger “complete” command under the covers in its readline completion.

So although I don’t have a complete completion package when you are typing an expression, a number of the underlying components are there should someone want to take this and extend it.


In Perl Tricks, I read about B::Deparse. I found that so interesting and useful that, starting with release 0.57, it has added to Devel::Trepan as debugger command deparse. And we can even use Perl::Syntax::Highlight to colorize the output.


At the opposite end of the spectrum, there is a Devel::Trepan plugin to disassemble Perl code. Right now disassembly is at the file or subroutine level. Since I have the actual stopped OP address via Devel::Callsite, it would be nice to be able to allow a narrower range. However I haven’t figured out exactly how to do that even though I have some hints. See Getting a B::OP from an address?

And finally on this topic of low-level information, I should mention David Golden’s suggestion for hooking into Devel::Peek. For more information and a workaround, see this issue.


Here, I introduced you into some of the introspection aspects of Devel::Trepan. By the way, all of pod documentation for debugger commands given above, can also be gotten inside the debugger itself, with its help command.

In a sense there’s nothing here that really isn’t in Perl itself. One can think of the debugger commands and internal routines, merely as a wrapper around existing modules and existing Perl features. (That’s where all of the real heavy lifting is done.) Given that, there is no reason why this couldn’t be added to other debuggers, or REPLs, if it isn’t there already.

Go forth and multiply! There is more than one way to do it.

RFC: Term::ReadKey Availability and requiring it in Term::ReadLine::Perl5

A while back I forked Term::ReadLine::Perl making Term::ReadLine::Perl5 because of the former maintainer's lack of responsiveness regrading my patch to add GNU Readline history and general lack of responsiveness overall.

Term::ReadLine::Perl purports to be a "pure Perl" version of GNU ReadLine. It can use, but does not require, Term::ReadKey. With this issue it seems that more hacking is needed when Term::ReadKey is not available.

Right now Term::ReadKey is recomme…

My Love/Hate Relationship with CPAN Testers

The Great Part

I really like the idea of a CPAN testing service where individuals volunteer their computers to tests CPAN packages and those results are accumulated and shared.

The accumulated results then are tallied with other result. People can use this information to help me decide whether to use a package or when a package fails if others have a similar problem.

Comparing the CPAN Testers to Travis (which I also use on the github repository), the CPAN Testers covers far more OS’s, OS distributions, and releases of Perl than I could ever hope try with Travis.

And that’s the good part.

The Not-so-Good Part

While there is lots of emphasis on rating a perl module, there is very little-to-no effort on detecting false blame, assessing the quality of the individual CPAN testers, the management of the Smokers, or of the quality of the Perl releases themselves.

Yet the information is already there to do this. It is just a matter cross-tabulating data, analyzing it, and presenting it.

Suppose a particular smoker reports a failure for a Perl module on a particular OS and distribution and Perl version, but several other smokers do not report that error. It could be that there is an obscure or intermittent bug in the perl module tested, but it also might be bugs in the Smoker environment or setup, or bugs in the Perl release.

Even if it is an obscure bug in the program that many other smokers don’t encounter, as a person assessing a Perl module for whether it will work on my systems, I’d like to know if a failure seems to be anomalous to that or a few smokers. If it is anomalous, the Perl module will probably work for me.

Rating a Perl Release.

Going further there is a lot of data there to rate the overall release itself.

Consider this exchange:


This Perl double free or corruption crash doesn’t look good for Perl 5.19.0 Comments?


5.19.0 was just a devel snapshot, I wouldn’t overrate it. Current git repository has 973 commits on top of 5.19.0.

Well and good, but isn’t that failure permanently marked in the testing service as a problem with my package when it isn’t? If 5.19.0 was flaky, isn’t the decent thing to do is to retract the report? Or maybe in the summary for this package the line listing 5.19.0 should note that this release was more unstable than the others?

Again, what sucks is that to me it feels like blame will likely forever be put on the package. In those cases where the report is proved faulty, well tough, live with it. It is the hypocrisy that bothers me the most — that the service attempts to be so rigorous about how a Perl module should work with everything, but so lax when it comes to ensuring what it reports is really accurate.

And this gets to the aspect how well or poorly the smokers are managed.

I mentioned before that if a particular smoker is the only one that reports a failure for that particular OS distro and Perl release, the smoker might be suspect. And if that happens with several packages, then that suggests more that smoker (or it could be a set of smokers managed by a person) is at fault. It may still be true that there may be legitimate bugs in all of the packages; perhaps the smoker has not-commonly-used LANG and LOCALE settings. But again, as things stand there is no way to determine that this smoker or set of smokers managed by a single person exhibit such characteristics.

Knowing that is both helpful to the person writing the failing package(s) as well as those who might want to assess overall failures of a particular package.

Rating the Testers and Responsiveness of Testers

There is an asymmetry in the way testing is done. Testers can choose Perl modules, but Perl Module authors, short of opting totally of the testing service, can’t choose testers. I think it is only a matter of basic fairness. The premise that Perl Modules will get better if they are tested and rated also applies to the testers.

I think one should have the ability for Perl Module authors to rate the responsiveness of testers of those reports they get (unsolicited except at the too coarse scale of opt-out of everything).

Let’s say I get a report that says my Perl Module fails its test on this smoker. Unless the error message clearly shows what the problem is (and again cross-checks to ensure validity are lacking) or unless I can reproduce the problem in an environment I control, I’m at the mercy of the the person running the smoker.

As you might expect there are some that are very very helpful, and some that I’ve sent email too and just don’t get responses. Having a simple mechanism where I could +1 or -1 the tester and the testers accumulated score that sent along with the report would be great. That way, if get several reports with failures I can pick which tester to work with first.

Given the fact that there is no effort to make each smoker not duplicate the work of others, in theory if the problem really is in the Perl Module rather than the tester’s setup or the Perl version, I should get multiple reports.

Alternatives to CPAN Testing Service?

I believe there are alternatives to the CPAN testing system. Any comments on them and how good they compare the CPAN testing system? Is there a way to have those show up in metacpan.org or search.cpan.org?