SelfLoader and things I've learned writing/using a debugger

A Subtlety that becomes Readily Apparent Using a Debugger

In both using and writing a debugger one can learn some unexpected and subtle things about the a programming language. And these can have some subtle effects.

Consider this Perl program:

sub five() { return 5; };
print five(), "\n";

It's not so different from the corresponding Ruby program:

def five(); return 5; end
print five(), "\n";

Or the corresponding Python program for matter:

def five(): return 5;
print five();

Note: I have modified the programs to be perhaps a little less idiomatic with respect punctuation, parenthesis, semicolons and even the "print" statement used in each language to emphasize how similar each really is.

If you run each in a debugger for each language and you'll immediately see a difference between Perl versus Ruby and Python.

In Perl, the first line you stop at is the second line with "print" in it. In Ruby and Python you stop at the first line. Why?

In Python and Ruby, functions add themselves at run time. So in Ruby and Python when you are stopped at that first line, the function five() hasn't been defined yet. In Perl, we say that the function is registered at "compile time". What are the implications?

Well here's one mistake I used to make before understanding this. I like to have demo code at the bottom of my modules. And one way to do this is:

package MyModule
...

unless (caller()) {
   # demo code section.
}
1; # to make require/use happy

Inside the demo code section, I might want to have some helper functions. For example I sometimes want to turn the response of a function into a nice string I can print, perhaps booleans to "yep" and "nope". When the program is loaded as a module, I don't need these functions.

When I was learning Perl, I used to write:

...
unless (caller()) {
   # demo code section.
   sub ack { $_[0] ? 'yep' : 'nope' };
   ... ack(MyModule::does_this()) ...
}

Thinking that by putting that sub inside the if, it would only get define when I was demoing the code like it would if this were Python or Ruby. Nope. That function is defined regardless. In Perl, the way one would get a the function conditionally defined would be to eval it:

unless (caller()) {
   eval "sub ack { $_[0] ? 'yep' : 'nope' } ";
   ...

And if you are writing and using a debugger for one of these languages it is significant in another way. Going back to the five() example. Suppose I'm stopped at the beginning of the program, say the stock python debugger, pdb, — but it is true for other other Python debugger as well. If I try to run five(), I'll get a NameError saying that 'five' is not defined. Because it isn't yet! Step over that line and low and behold one can then call that function in Python or Ruby.

Now both Ruby and Python do distinguish compile time from run time. If you have a syntax error in your program, that will be noticed at "compile" time in all 3 languages. What's different is exactly what things are normally done at compile time and what's normally done at run time. Ruby tends to do things later than both Python and Perl here.

But is the separation between compile and run time really all that distinct in Perl? Well..., no.

First we've seen that I can run eval() to delay until run time something that would otherwise happen at compile time. But it works the other way around too: one can also run arbitrary Perl code at "compile" time! That's what's done inside BEGIN blocks.

Perl's use command is exactly equivalent to:

 BEGIN { require Module; Module->import( LIST ); }

SelfLoader

With the above explanation out of the way, let me come back to a problem I had. I was debugging a Perl version of GNU Readline called Term::ReadLine::Perl5 the other day. It uses SelfLoader.

SelfLoader allows subroutines to get loaded at run time. This I guess trades off possibly a little bit of space if the routine isn't used at the expense of extra loading time at run time when the routine does get loaded. Perhaps this was useful in the days of your when memory was much more constrained in the days when DOS had a maximum of 64K without a memory extender installed. If this sounds like mumbo jumbo, well it was kind of silly. The point is nowadays, I don't really know that is that beneficial, but I'd be interested to hear of situations where it is useful.

The important thing about SelfLoader and debuggers is that since there is an eval that happens at run-time to insert subroutines, most debuggers can't show you were you are when you trace into one of these routines. With the recent release 0.49 of Devel::Trepan, I've now fixed that.

And in doing so I learned another trick which was the point of this blog.

I was trying to figure out how to slighly modify the behavior of SelfLoader, and came across the following trick: if one inserts a code reference into the @INC array, that code will get called when looking for an unloaded module in either a via use or require. And when running a debugger at the outset, the debugger often gets loaded in first! See my perlmonks query for my musings on this problem.

I don't do this yet, but this also allows me the possibility of tracing use statements that get called which I've also always wanted.

For a simple hook that will show you how to trace use statements, try this:

BEGIN {
    unshift @INC, \&use_hook;
};

sub use_hook {
    my ($coderef, $filename) = @_; # $coderef is \&use_hook
    print "Looking for use of", $filename, "\n";
    # caller() can be used to find out who needs $filename

    # I don't want the standard SelfLoader.pm
    if ($filename eq 'SelfLoader.pm') {
        # pull in my replacement of SelfLoader
    }
}

9 Comments

That eval doesn't work either:

use v5.14;

if (0) {
eval {
sub foo { 42 }
};
}

say main->can('foo');

You'd either need to use a string eval _or_ you can use a glob assignment:

use v5.14;

if (0) {
*foo = sub { 42 }
}

say main->can('foo');

Block evals are still seen and parsed at compile time.

if (caller()) {
    use experimental 'lexical_subs';
    my sub ack { $_[0] ? 'yep' : 'nope' }
    ... ack(MyModule::does_this()) ...
}

Not for use in production code yet, obviously, but hopefully in the not too distant future.

“Perl is worse than Python because people wanted it worse.”
—Larry Wall, in <7039ji$mtk@kiev.wall.org>

Rather then the eval, I'd use a closure:


...
if (caller()) {
# demo code section.
my $ack = sub { $_[0] ? 'yep' : 'nope' };
... $ack->(MyModule::does_this()) ...
}

The only time I'd use an eval or mucking with globs like above is if I were doing something like injecting a mock subroutine into code that wasn't otherwise very testable.

In all three of the other alternatives for how to run a subroutine in demo code, the code is actually compiled beforehand.

The same is true of your Ruby and Python examples.

The intent was in my mind was to have a temporary function that just exists […] only when that branch is taken.

How is that supposed to work? Would the compiler somehow leave that bit of the code unparsed? How would it identify where the not-to-parse code ends (i.e. in this case, where the closing curly is) without parsing?

Leave a comment

About rockyb

user-pic I blog about Perl.