Kiss Kiss Shebang Shebang
At the end of the discussion, our sysadmin commented:
Perl sure does seem to need a lot of scaffolding these days before one can get around to the business purpose.
And my response was that Perl had always needed a lot of scaffolding. It’s just that we never used to notice, because it was all built-in.
Pretty much every Perl tutorial is going to tell you that the first line of your first Perl script should look something like this:
Now, sure: it isn’t going to be that easy on Windows, or other non-Unixy systems, and even in the Unices there are going to be flavors where Perl is in
/usr/local/bin or somesuch, but that line actually works on a significant majority of the potential cases. And that’s all you have to do to make your Perl program work. Perl is a compiled language, but you don’t have to compile your Perl code (which is why it’s commonly considered an interpreted language, even though it’s technically not). Whenever you run your Perl code, it just magically compiles and runs, including finding all the executable bits and all the libraries and all the modules. You don’t have to worry about compiling and linking, as you would with C++. For that matter, you didn’t have to install Perl in the first place, as you might have to do with Python or Ruby. Nope, everything—all the scaffolding—is just there.
Which is awesome when you’re writing “hello world.” When you’re crafting an extensive business application ... not so much. Because, to quote Piglet:
Sometimes things seem like really good ideas and aren’t.
The first problem you inevitably hit when you’re learning Perl is modules. CPAN is huge. It contains everything you could possibly want,1 and quite a few things you really don’t. It isn’t practical for Unix boxes to come with the entirety of CPAN installed. Plus CPAN is always changing: even if your favorite flavor of Unix did include all of CPAN, it would be out-of-date long before the install CD hit your laptop. So Unix distros have to pick which modules come standard. Perl itself has to choose what modules are considered “core.” So you’re always going to run into that situation where you need to install something from CPAN.
Happily this is not hard. The toolchain for installing CPAN modules is excellent. Oh, sure: we bitch about this part and that part constantly, but try intalling a Ruby gem sometime.2 Our shit is tight compared to that.
No, the problem is not getting it installed, but rather where to install it. See, it took us something on the order of a decade or so to figure it out, but we finally realized that installing modules into the system Perl area is a terrible idea. Modules need other modules, or upgraded versions of standard modules, and modules (or versions of modules) don’t always play nice with each other. Managing all this module interaction is hard enough in a codebase of any significant size. Throwing system Perl into the mix just makes it crazy.
And on top of everything else, Perl is just so goshdarned useful for sysadmin-y things that lots of systems tools are built on it. So, if you end up breaking system Perl, sometimes you end up breaking your entire system. And, to add insult to injury, you rarely break it definitively. It often breaks in really strange ways that can be difficult to reproduce: different symptoms for different users, breaking or not depending on module load order, that sort of thing.
So we invented local::lib.3 And we started using it to solve this problem. Now we could have our own little library area, completely separate from the system Perl. For instance, here’s the standard prefix that we’ve been using at my current company:
#!/usr/bin/perl use strict; use warnings; use Cwd 'abs_path'; use File::Spec::Functions qw(catpath splitpath); use local::lib catpath((splitpath(abs_path $0))[0, 1], '../extlib'); use lib catpath((splitpath(abs_path $0))[0, 1], '../lib'); use CE::Util::ScriptBootstrap;
As you can see, it uses only core modules (only modules that have been core since ancient times, even4), and it uses the directory of the script itself to find our non-system CPAN modules (
extlib/) and our corporate modules (
lib/). That way, if you happen to have two separate copies of the codebase (different branches, perhaps), each one finds its own set of modules.
Still, there are problems here. For one, it assumes that the script itself lives only one level down from the root of our codebase (e.g., in
bin/). What if it’s two levels down, such as in
bin/util/? Well, then, you have to change the boilerplate:
use Cwd 'abs_path'; use File::Spec::Functions qw(catpath splitpath); use local::lib catpath((splitpath(abs_path $0))[0, 1], '../../extlib'); use lib catpath((splitpath(abs_path $0))[0, 1], '../../lib'); use CE::Util::ScriptBootstrap;
And so on, the deeper you go. Then what happens if you need to move the script to a different level?
The other problem here is more subtle, although I’m sure some of you saw it right away. I’ll give a bizarre error message as a hint:
File::Spec version 3.4 required--this is only version 3.33 at /var/local/CE/bin/../extlib/lib/perl5/Path/Tiny.pm line 13. BEGIN failed--compilation aborted at /var/local/CE/bin/../extlib/lib/perl5/Path/Tiny.pm line 13.
What makes that bizarre is that, if you look at the version of File::Spec in our
extlib/, it really is 3.4. So where did 3.33 come from?
Why, from the system Perl, of course.
Because, you see, we loaded File::Spec before we used local::lib to redirect where we were pulling modules from.5 Here was one suggestion to fix the problem:
#!/usr/bin/perl use strict; use warnings; use FindBin; use lib "$FindBin::Bin/../lib"; use local::lib "$FindBin::Bin/../extlib"; use lib "$FindBin::Bin/../lib"; use CE::Util::ScriptBootstrap;
And don’t forget that those “pragmas” at the top (i.e.
warnings) are just modules with a slightly different naming convention. As I discovered to my chagrin when I attempted to add a
use autodie to that list.
And on top of all that, we’re still using the system Perl. Not the system Perl modules any more, but still good ol’
/usr/bin/perl, as the shebang line attests.
The shebang line ... the problem with that (as we mentioned above) is that it contains a hard-coded pathname, which may not always be where the system Perl is, depending on the system. But it gets worse if you decide you don’t want to use the system Perl at all. See, system Perls have a tendency to lag behind the rest of the world: until recently, it wasn’t that uncommon to have your system Perl turn out to be over a decade old.8 Lately, our beloved Perl has been getting some much needed improvements, catching up with those young whippersnappers that want to supplant us, but that does you no good if your system Perl is stuck in 2002. Unless you totally get rid of your system Perl, of course.
Enter Perlbrew (and plenv, although I’m not as familiar with that one). Now you can install whatever version of Perl strikes your fancy, and it probably won’t hose your system, even. Of course that means you have to go back and change all your shebang lines. But whatever shall you change them to? Well, you can read all about the intricacies and debates on the Perlbrew web site, but the common concensus these days is this:
#! /usr/bin/env perl
which will fetch Perl from your
$PATH—that is, theoretically you’ll get the same thing you’d get if you typed
perl at the command line. Again, presuming Unixoid systems, which likely encompasses recent versions of MacOS but not Windows. Of course, there are versions of Unix where
env doesn’t live in
/usr/bin, but this works quite a lot of the time, and you’ll find it’s the most common incantation recommended these days in tutorials and blog posts.
Of course, it presumes that the right version of Perl is in the
Now, let’s say I want to install Perl (via Perlbrew, or plenv, or what-have-you) into a directory in my corporate codebase. Like our
extlib/ above, but probably something more like
perl5/. I certainly can’t put an absolute path in my shebang lines for my scripts, so I’m probably going to want to go with something like the
/usr/bin/env perl trick. Except ... how do I get the proper directory into the environment? In an ideal world, it’s the same directory on every machine, but I learned long ago that we don’t live in an ideal world. What if we need multiple installations on a single machine (hinted at above)? Then we certainly can’t use the same directory for everything. There might be other reasons too (e.g. different environments, like staging or integration testing, might be more convenient if set up using different directories). So something somewhere has to set that directory—I know damn well that if I hardcode it everywhere, I will definitely live to regret that. Something has to put it in the path. Us humans can set up the environments for ourselves, but what about the web servers that run as pseudo-users? What about the cronjobs, which have little to no environments at all? What about the scripts that have to install all the directories in the first place?
One of my coworkers polled a group of devops folks somewhere, and the concensus was, use a bash script which sets up your environment and passes arguments through to the real Perl. Such a bash script would be invoked whenever you need to launch your scripts in a scenario where you can’t guarantee that the environment would already be there (cronjobs, sudo, running remotely via
ssh, etc). That is, instead of
now you’re going to do
instead.9 Okay, lovely. But it means you have to go through all your crontab files, and all your scripts that log into other machines and run other scripts, etc etc, and modify how they’re all called. So it got me thinking ... if we’re going to have to change all our shebang lines anyway, couldn’t we just do all this in one fell swoop? I mean, couldn’t we write something like this:
#! /bin/bash ROOTDIR=$(magic-directory-figurer-out-er) PERLDIR=$ROOTDIR/perl5 PERL5LIB=$ROOTDIR/perl5/lib:$ROOTDIR/lib exec $PERLDIR/bin/perl "$@"
and call it ... oh, I dunno, say
/usr/local/bin/invoke, and then change all our shebang lines to:
The more I pondered this, the more excited I got by the possibilities. Not only does this give us one centralized location to fiddle with environment variables, but it also gives us a place to add Perl switches. We could turn on taint checks for every script at once with a 3-character change to a single file, for instance. Always load some module or other. Change it temporarily to generate profiling info. Whatever. This could be cool!
If it works ...
So first I checked to see if it should work, reliably, across enough different flavors of Unix. Meaning I Googled. A lot. And I found a page which told me that any reasonably modern version of Linux and bash should be able to handle this. I even found several examples of people doing it. Woohoo. So, next, I tried it.
And it just hung. Like, forever.
So, I added some debugging. And tried again. And promptly blew out my
screen buffer. So I tried again, a bit more cautiously:
[absalom:~] ./t.pl | head + echo 'args: ./t.pl' + exec /usr/bin/env perl ./t.pl args: ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' args: ./t.pl + exec /usr/bin/env perl ./t.pl + echo 'args: ./t.pl' + exec /usr/bin/env perl ./t.pl args: ./t.pl + echo 'args: ./t.pl'
What the hey?? It’s like my bash script is just invoking itself, over and over again, and nothing in my Perl script (which also had debugging in it) was getting run at all. Hmmm ...
But then I remembered seeing this in the Wikipedia article about shebangs, and this in a random Stack Overflow question. Both of them go to some trouble to skip the shebang line when reprocessing the script. And, sure, neither one seems to have anything at all to do with the problem I’m trying to solve, but it’s worth a shot, right?
#! /bin/bash ROOTDIR=$(magic-directory-figurer-out-er) PERLDIR=$ROOTDIR/perl5 PERL5LIB=$ROOTDIR/perl5/lib:$ROOTDIR/lib # have to skip the shebang line in the script we're running # otherwise we get an infinite loop script="$1" ; shift exec $PERLDIR/bin/perl <(tail -n +2 "$script") "$@"
Boom. That works.
So I told my coworkers. Two of them immediately said I didn’t need to do all that futzing around with skipping the first line, and one of them additionally pointed out that I was going to throw off all the error messages in our scripts by one. To which my responses were: 1) oh, yes, you really do need all that—trust me—and 2) oh, yeah ... good point. Sigh.
So, while thinking about how to change my magical incantation so it preserved the line numbers, I attempted to explain to everyone why it was necessary to skip the first line. Casting my mind back to a conversation from a few years ago, I remembered that Perl itself would pay attention to a shebang line. That has to do with being able to pass multiple switches to the Perl executable in the shebang line. In other words, the reason you can do this:
#! /usr/bin/perl -T -Ilib -w
or whatever10 probably has nothing to do with your version of Linux or bash, and only works because Perl re-executes itself with the proper switches after seeing the shebang line. So it has to be reading the shebang line, and conditionally doing something with it. So, maybe it has something to do with that, I threw out (totally grasping at straws).
My coworkers expressed doubts. What about good old
/usr/bin/env perl, they challenged? Why does that one work then?
“Uhhhhh ...” was my brilliant reply. Maybe there’s a specific exception for
env? I followed that piece of infallible hypothesis with this one: “or else it might just be happy that the word
perl is in there somewhere.” Immediately one of my coworkers changed the name of his script from
launch-perl or somesuch and bang! it worked.
Whoa, I said.
You mean I was really right about that? No way ...
Well, of course now I had to go find where this is documented. As it turns out, it’s right there at the top of the perlrun man page:
#!line does not contain the word “perl” nor the word “indir” the program named after the
#!is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don’t do
#!, because they can tell a program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter for them.11
So, at the end of it all, I’m not sure if this post is an interesting detective story, or a tip for avoiding an infinite loop gotcha should you ever try what I did, or just a long, meandering answer to my sysadmin’s observation about Perl sure requiring a lot more scaffolding these days,12 but hopefully someone out there got something out of it. I thought the whole thing was pretty interesting, at least. And now I’ve shared it with you.
1 Okay, okay: not everything. As our dear language ages, we are finding more and more things that there isn’t a CPAN module for yet. But you’ll allow me a bit of hyperbole in service of a greater truth, I hope.
2 Not to pick on Ruby. Python can be just as much of a pain in the ass, and Node is usually even more so.
3 And by “we” I mean “mst.”
4 Perl 5.004 was released in 1997, which makes it older than my oldest child. That counts as ancient times as far as I’m concerned.
5 Please note that I do not take any credit for figuring this out; smarter people than I at $work explained this to me, and now I’m passing it on to you.
7 Well, most versions of local::lib out there in the wild anyway. To be fair, the newer versions are more circumspect about loading File::Spec—for this very reason.
8 Yes, Perl 5.8 is nearly 13 years old, and that was standard on many distros. Probably still is, on some.
9 Let’s conveniently ignore the question of how we ended up in the current directory in the first place. You can’t ignore it in the real world, of course, but we can ignore it here, for now.
10 Except don’t actually use
perl -w. It’s inferior to
use warnings in many, many ways.
11 The “indir” bit is new as of Perl 5.16, although I can’t seem to find any explanation of what it’s there for. I did find the commit which changed the man page, hoping that its log message would also contain some reference to its function. But, no such luck. If anyone knows the reason for it, toss it out in the comments: I’d love to hear about it.
12 Yeah, okay, it’s long and meandering either way: I know. I know.