SelfLoader and things I've learned writing/using a debugger
A Subtlety that becomes Readily Apparent Using a Debugger
In both using and writing a debugger one can learn some unexpected and subtle things about the a programming language. And these can have some subtle effects.
Consider this Perl program:
sub five() { return 5; };
print five(), "\n";
It's not so different from the corresponding Ruby program:
def five(); return 5; end
print five(), "\n";
Or the corresponding Python program for matter:
def five(): return 5;
print five();
Note: I have modified the programs to be perhaps a little less idiomatic with respect punctuation, parenthesis, semicolons and even the "print" statement used in each language to emphasize how similar each really is.
If you run each in a debugger for each language and you'll immediately see a difference between Perl versus Ruby and Python.
In Perl, the first line you stop at is the second line with "print" in it. In Ruby and Python you stop at the first line. Why?
In Python and Ruby, functions add themselves at run time. So in Ruby and Python when you are stopped at that first line, the function five() hasn't been defined yet. In Perl, we say that the function is registered at "compile time". What are the implications?
Well here's one mistake I used to make before understanding this. I like to have demo code at the bottom of my modules. And one way to do this is:
package MyModule
...
unless (caller()) {
# demo code section.
}
1; # to make require/use happy
Inside the demo code section, I might want to have some helper functions. For example I sometimes want to turn the response of a function into a nice string I can print, perhaps booleans to "yep" and "nope". When the program is loaded as a module, I don't need these functions.
When I was learning Perl, I used to write:
...
unless (caller()) {
# demo code section.
sub ack { $_[0] ? 'yep' : 'nope' };
... ack(MyModule::does_this()) ...
}
Thinking that by putting that sub inside the if, it would only get define when I was demoing the code like it would if this were Python or Ruby. Nope. That function is defined regardless. In Perl, the way one would get a the function conditionally defined would be to eval it:
unless (caller()) {
eval "sub ack { $_[0] ? 'yep' : 'nope' } ";
...
And if you are writing and using a debugger for one of these languages
it is significant in another way. Going back to the five()
example. Suppose I'm stopped at the beginning of the program, say the
stock python debugger, pdb, — but it is true for other other Python
debugger as well. If I try to run five(), I'll get a NameError
saying
that 'five' is not defined. Because it isn't yet! Step over that line
and low and behold one can then call that function in Python or Ruby.
Now both Ruby and Python do distinguish compile time from run time. If you have a syntax error in your program, that will be noticed at "compile" time in all 3 languages. What's different is exactly what things are normally done at compile time and what's normally done at run time. Ruby tends to do things later than both Python and Perl here.
But is the separation between compile and run time really all that distinct in Perl? Well..., no.
First we've seen that I can run eval() to delay until run time something that would otherwise happen at compile time. But it works the other way around too: one can also run arbitrary Perl code at "compile" time! That's what's done inside BEGIN blocks.
Perl's use command is exactly equivalent to:
BEGIN { require Module; Module->import( LIST ); }
SelfLoader
With the above explanation out of the way, let me come back to a problem I had. I was debugging a Perl version of GNU Readline called Term::ReadLine::Perl5 the other day. It uses SelfLoader.
SelfLoader allows subroutines to get loaded at run time. This I guess trades off possibly a little bit of space if the routine isn't used at the expense of extra loading time at run time when the routine does get loaded. Perhaps this was useful in the days of your when memory was much more constrained in the days when DOS had a maximum of 64K without a memory extender installed. If this sounds like mumbo jumbo, well it was kind of silly. The point is nowadays, I don't really know that is that beneficial, but I'd be interested to hear of situations where it is useful.
The important thing about SelfLoader and debuggers is that since there is an eval that happens at run-time to insert subroutines, most debuggers can't show you were you are when you trace into one of these routines. With the recent release 0.49 of Devel::Trepan, I've now fixed that.
And in doing so I learned another trick which was the point of this blog.
I was trying to figure out how to slighly modify the behavior of SelfLoader, and came across the following trick: if one inserts a code reference into the @INC array, that code will get called when looking for an unloaded module in either a via use or require. And when running a debugger at the outset, the debugger often gets loaded in first! See my perlmonks query for my musings on this problem.
I don't do this yet, but this also allows me the possibility of tracing use statements that get called which I've also always wanted.
For a simple hook that will show you how to trace use statements, try this:
BEGIN {
unshift @INC, \&use_hook;
};
sub use_hook {
my ($coderef, $filename) = @_; # $coderef is \&use_hook
print "Looking for use of", $filename, "\n";
# caller() can be used to find out who needs $filename
# I don't want the standard SelfLoader.pm
if ($filename eq 'SelfLoader.pm') {
# pull in my replacement of SelfLoader
}
}
That eval doesn't work either:
You'd either need to use a string eval _or_ you can use a glob assignment:
Block evals are still seen and parsed at compile time.
Ok, thanks. I have corrected the blog entry.
That Perl has all of these slight variations (and I didn't even get into the other phases that Perl has like
INIT
,CHECK
, orUNITCHECK
) I don't necessarily think is a good thing.One of the things I like about Ruby is how it does more with less.
Not for use in production code yet, obviously, but hopefully in the not too distant future.
While this is good and I applaud such a change it shows another unflattering aspect with Perl that is alluded to in this blog entry and in one of my replies about not discussing the
INIT
,UNIT
, sections: Perl is ever getting more complex with features that overlap in function. Let's face it, features get added much faster than old less-good ones get removed.So for a long time (or forever) there will be code that eval's with a string and some other code that uses lexical subs. And programmers will be faced with having to know both and consider if the code they write needs to work with older Perls in which case the string eval gets used or just newer ones where lexical subs are used. And then maybe at some point code will be migrated from one to the other. All of this could keep a Perl programmer very busy.
It is Perl's philosophy that there's more than one way to do things. To some extent that's great in that it can foster creativity, but also at some point it becomes a major effort to figure out which way is the clearest and simplest among so many overlapping features.
Contrast this to a more orthogonal language. That language can be just as powerful, but much simpler to master with proficiency. In my experience, smaller orthogonal languages do not impede creativity or expressiveness.
“Perl is worse than Python because people wanted it worse.”
—Larry Wall, in <7039ji$mtk@kiev.wall.org>
Rather then the eval, I'd use a closure:
The only time I'd use an eval or mucking with globs like above is if I were doing something like injecting a mock subroutine into code that wasn't otherwise very testable.
I confess that I had previously been thinking about this perhaps too restrictively by expecting that the code for the subroutine not exist if that branch of the if statement isn't taken.
Aspects of the blog were about how a Perl programmer may sometimes have to keep in mind the ramifications of the which "phase" things are done, the somewhat blurring of distinction between the compile and run phases since you can just do about anything at compile time that you can do at run time, and finally a hack for being able to trace use or require statements.
In all three of the other alternatives for how to run a subroutine in demo code, the code is actually compiled beforehand. By eval'ing a string, the code is only around in string form before it is actually run.
(One way you can verify this is to install Devel::Trepan::Disassemble run the code and enter the debugger "disassemble" command, or use B::Concise directly.)
In the case of the glob, the function is not seen via can() but it is around in main's symbol table. In the case of lexical subs and a closure, the code is also still around but not visible globally. With the closure that subroutine variable can still be passed around which extends its visibility which wasn't intended. I'm not sure if that's also true with lexical subs.
In my mind, the intent was to have a temporary function that just exists — and again perhaps this was too limiting on my part by assuming that it wouldn't exist in compiled form — only when that branch is taken. Why? Well, in the back of my mind I am hoping that the code is thrown away at compile time (whether for Perl, Python or Ruby). In fact I don't think it happens in any of these, but another way write this which may make it more obvious to a compiler that this is dead code on compilation is:
if (__FILE__ == $0) {
....
}
- - - -
By the way, in debugging the various examples, I see that a misfeature of both perl5db and Devel::Trepan (and most likely every other Perl debugger) is that the behavior is different evaluating in the debugger the results reported by caller(). I'll probably fix that down the line in Devel:Trepan by inserting a custom caller routine that corrects for this discrepancy.
The same is true of your Ruby and Python examples.
How is that supposed to work? Would the compiler somehow leave that bit of the code unparsed? How would it identify where the not-to-parse code ends (i.e. in this case, where the closing curly is) without parsing?
You may have responded before I edited the blog post, so I am sorry for not being more clear initially.
The other form of the expression is:
if $0 == __FILE__
and given that compilation occurs just before running without a break in between, the compiler can have the value of
$0
around at compile time as it most definitely does for__FILE__
. (I have to walk this back a little since in Perl one can assign to $0.)Yes, that one is perhaps a little far-fetched as is caller() unless one imagines looking for that specific idiom during compilation. And I am given to understand that Perl does look for other idioms like this.
But in Python, the idiom is:
if __name__ == '__main__':
and this is the more apparent for dead code since that is a compile-time expression.
C Python process Python code and saves that into a .pyc file, but I don't think it does much more than tokenize the file. So I guess if something like this were to be done, there would be a dead-code elimination phase before running.
The other thing I had in my mind was having something like a preprocessor to look for those demo-code idioms and strip them, much in the same way that Google has a tool for "optimizing" javascript code.
My debugger has over 70 files and most if not all of them have demo code. So that's a lot of savings.
Again, I'm sorry for being a bit vague initially and in the blog post about all of this.