Location, Location, Location

By way of explanation...

In preparing this and my previous blogs, I have noticed aspects where Devel::Trepan could be improved. For this blog, I discovered when comparing Devel::Trepan output with that from a recent perl5db that perl5db sometimes prints several lines of output to try to show a full Perl statement. Devel::Trepan prints a single line — normal in command-line debuggers. However, do see the set auto list command.

As I've done in preparing previous blogs, I then take time from writing the blog to improve Devel::Trepan. Although no one has said anything about this yet in prior blogs, the output you see in the blogs may be a little bit different than what you see if you install from CPAN. However it does match what you will see if you install from the github repository.

But this brings up a couple of other points. First, that one of the reasons that perl5db is probably hard to replace by any debugger is that right now people are still tweaking it.

Before I had started Devel::Trepan, I was trying to the acess alternatives to perl5db and came across the DB API which makes the DB package more object oriented. There are a couple of interesting ideas I have used; it also modularizes the code more. I figured since it was on the perl.org site and part of the CORE modules, it was being actively maintained. Then I got to this part in the BUGS section:

The interface defined by this module is missing some of the later additions to perl's debugging functionality

This is a bit of an understatement. How old is the version of perl5db that DB uses? The DB code is virtually unchanged since Perl 5.8.9. The difference between the version in Perl 5.8.9 and Perl 5.6.2 are some small bug fixes. Given this, the perl5db code that the DB module is based off of is about a decade old. perl5db has changed a lot since then and continues to change a little.

My second point here, however is a little more optimistic. It didn't take me long to improve Devel::Trepan so it had the enhancement. Furthermore, all the pains taken to make the code more modular and testable do pay off here, at least for me. I was able to test this aspect in isolation and without having to install the code in order to test it out. The integration tests also notice the change.

And now that I have made the change, even those of you who are running an older Perl, you can get this feature now in Devel::Trepan. But until I make another release, you need to use the github repository.

Note to perl5db advocates: If you feel this is an unfair advantage to Devel::Trepan, there is a simple fix — provide perl5db as a module of its own on CPAN. (Devlish grin.)

Now back to your regularly scheduled blog...

One important piece of information that a debugger provide is where you are, and why. Although this simple and basic, part of what I'll discuss below is how and when it gets difficult.

Programming Languages that don't support debugging tend be looser about recording location information accurately. In fact every programming language where I have worked on a debugger where previously there was none or where it was weak, I have found that I had to correct inaccuracies or vagueness or in the parsing or run-time system.

While reporting a vague or misleading location in say a die location may be tolerated by a Perl programmer (although not appreciated), in a debugger reporting the wrong or vague location is unacceptable.

But the good news for Perl is that since it has had a debugger for a very long time, many of location weaknesses have long been corrected.

Let's now look at how a location and status is reported in Devel::Trepan.

$ trepan.pl gcd.pl 3 5
-- main::(gcd.pl:18)
die sprintf "Need two integer arguments, got %d", scalar(@ARGV) unless 
    @ARGV == 2;
                                                                            (trepanpl):

The location reporting here is more like perl5db than it is gdb. Perhaps this is a mistake, but when I embarked on the first debugger, one for bash, I recall feeling that perl5db's format was nicer. Still, the location output above is a little bit different from both in some respects.

Since the line we are stopped on isn't a full Perl statement, the next line which completes the statement is shown. More recent perl5db does this too while gdb and many other command-line debuggers do not.

The heuristic that both use can be somewhat easily fooled. For example, if your statement is a string that spans several lines and has what looks like a comment or a blank line, both will think that's the end of the statement. That is, if your statement is:

  $x = <<'EOS';
Hi
# This is not a comment
EOS

when either debugger is stopped at the assignment above, they show the line with "Hi" in it, but not the line after that.

Here is another thing to notice. In showing the source text, Devel::Trepan doesn't include the line number at the left. This was a usability thing for me. The line number is redundant in perl5db and gdb because it is given on the line before listing the source code. I prefer to see the line as it really is in the source text. If you want line numbers before the source code, use the list command.

Another difference which might be a bit subtle here is the innocuous-looking "--" at the beginning of the line

-- main::(gcd.pl:18)

This is a two-character icon for the event name. An event is the reason that you were stopped. Here -- indicates a "line" event and is given when you have just started the program or have stopped because of stepping. A full list of even icons and there meaning is given on the Perl::Devel::Trepan github wiki.

Another way to see the more complete event name is to use one of the debugger commands taken from gdb: "info program" or "info line". Here are examples of using those:

(trepanpl): info program
Program: gcd.pl.
Program stop event: line.
(trepanpl): info line
Line 18, file gcd.pl

What I would like to add but don't know how to do is add something called "COP address". As described in the B::Concise documentation, a "COP" is an operation that starts the beginning of a statement. I recall reading somewhere "COP" stands for "Control OpCode".

Why include this? I'll get back to this shortly.

Suppose you are trying to debug your new Obfuscated Perl or JAPH code? This is a fine use of a debugger if there ever was one. It is also a good stress-test case of the debugger too.

Let's say your Perl code is this:


`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

You can debug it this way:

trepan.pl  -e '`$=`;$_=\%!;($_)=/(.)/;$==++$|;...`'

Or the more traditional Perlish way:

perl -d:Trepan -e '`$=`;$_=\%!;($_)=/(.)/;$==++$|;...`'

I would like to be able to give some indication of where you are in that multi-line statement. There are 3 approaches that I can think of. However first let me explore a misconception.

If you reformatted this with a Perl pretty printer like perltidy, wouldn't that do the trick? Well, yes, it helps; but that's not the whole story. Suppose your code is this:

#...
@x = ($a/$b, $c/$d);

If an "Illegal division by 0" occurs, was the division in the first entry of the array or the second? So let me restate the problem. We want a way to identify in a Perl program more precisely — generally more precise than a line and filename — where the program was at various "trace" points. A "trace" point can be a call, a place where an exception can occur, or at a Perl statement boundary.

The first way to give more precise location information would be to improve the Perl implementation to pass column information within the line down to make it accessible somehow at runtime. It could be stored in the OP nodes, or as in separate place which is accessible when needed. This latter approach is done in more static languages and standarized in the various binary formats like COFF or ELF's DWARF. I think the liklihood of this happening is slim and it adds more overhead the interpreter.

The second way to give more precise location information is a little ugly but probably the most doable: Give a disassembly of the instruction tree at that trace point. In a debugger this is at the point the program is stopped at. Devel::Trepan has a plugin to add a dissassemble command allows you to see dissassembly of the code around the line the program is stopped at. But right now, I can't figure out how to get information on which node is the one that is going to be interpreted next.

The third possibility extends on the idea of pretty printing the source code. But it goes in the other extreme direction than that given above in the obfuscated code: put each Perl token on its own line.

Scratch this third possibility: line information is only stored in COP nodes and those appear only at statement boundaries.

Now let us go back to one of the "perl -e" invocations given above. When you are debugging the obfuscated code, stopped inside the debugger, in what "file" are you in? Well, there isn't a file in a file system. So what Perl does is report that as "eval" and some number. And in the runtime access to lines in that "file" work as though there were lines in a filesystem. However what if you are working from a front-end GUI such as Padre, or the one I wrote for GNU Emacs? What Devel::Trepan does to make it easier here is write the eval string to a temporary file changing the location reported from a psuedo-filename into a real one, albeit temporary. Having this eval string in expanded form is also helpful even if you are just in Devel::Trepan because you might want to edit that temporary file in order to understand how the expansion worked or modify and experiment with it.

Leave a comment

About rockyb

user-pic I blog about Perl.