Kiss Kiss Shebang Shebang

At the end of the discussion, our sysadmin commented:

Perl sure does seem to need a lot of scaffolding these days before one can get around to the business purpose.


And my response was that Perl had always needed a lot of scaffolding.  It’s just that we never used to notice, because it was all built-in.

Pretty much every Perl tutorial is going to tell you that the first line of your first Perl script should look something like this:

#! /usr/bin/perl

Now, sure: it isn’t going to be that easy on Windows, or other non-Unixy systems, and even in the Unices there are going to be flavors where Perl is in /usr/local/bin or somesuch, but that line actually works on a significant majority of the potential cases.  And that’s all you have to do to make your Perl program work.  Perl is a compiled language, but you don’t have to compile your Perl code (which is why it’s commonly considered an interpreted language, even though it’s technically not).  Whenever you run your Perl code, it just magically compiles and runs, including finding all the executable bits and all the libraries and all the modules.  You don’t have to worry about compiling and linking, as you would with C++.  For that matter, you didn’t have to install Perl in the first place, as you might have to do with Python or Ruby.  Nope, everything—all the scaffolding—is just there.

Which is awesome when you’re writing “hello world.” When you’re crafting an extensive business application ... not so much.  Because, to quote Piglet:

Sometimes things seem like really good ideas and aren’t.

The first problem you inevitably hit when you’re learning Perl is modules.  CPAN is huge.  It contains everything you could possibly want,1 and quite a few things you really don’t.  It isn’t practical for Unix boxes to come with the entirety of CPAN installed.  Plus CPAN is always changing: even if your favorite flavor of Unix did include all of CPAN, it would be out-of-date long before the install CD hit your laptop.  So Unix distros have to pick which modules come standard.  Perl itself has to choose what modules are considered “core.” So you’re always going to run into that situation where you need to install something from CPAN.

Happily this is not hard.  The toolchain for installing CPAN modules is excellent.  Oh, sure: we bitch about this part and that part constantly, but try intalling a Ruby gem sometime.2  Our shit is tight compared to that.

No, the problem is not getting it installed, but rather where to install it.  See, it took us something on the order of a decade or so to figure it out, but we finally realized that installing modules into the system Perl area is a terrible idea.  Modules need other modules, or upgraded versions of standard modules, and modules (or versions of modules) don’t always play nice with each other.  Managing all this module interaction is hard enough in a codebase of any significant size.  Throwing system Perl into the mix just makes it crazy.

And on top of everything else, Perl is just so goshdarned useful for sysadmin-y things that lots of systems tools are built on it.  So, if you end up breaking system Perl, sometimes you end up breaking your entire system.  And, to add insult to injury, you rarely break it definitively.  It often breaks in really strange ways that can be difficult to reproduce: different symptoms for different users, breaking or not depending on module load order, that sort of thing.

So we invented local::lib.3  And we started using it to solve this problem.  Now we could have our own little library area, completely separate from the system Perl.  For instance, here’s the standard prefix that we’ve been using at my current company:

#!/usr/bin/perl
 
use strict;
use warnings;
 
use Cwd 'abs_path';
use File::Spec::Functions qw(catpath splitpath);
use local::lib catpath((splitpath(abs_path $0))[0, 1], '../extlib');
use lib catpath((splitpath(abs_path $0))[0, 1], '../lib');
use CE::Util::ScriptBootstrap;

As you can see, it uses only core modules (only modules that have been core since ancient times, even4), and it uses the directory of the script itself to find our non-system CPAN modules (extlib/) and our corporate modules (lib/).  That way, if you happen to have two separate copies of the codebase (different branches, perhaps), each one finds its own set of modules.

Still, there are problems here.  For one, it assumes that the script itself lives only one level down from the root of our codebase (e.g., in bin/).  What if it’s two levels down, such as in bin/util/?  Well, then, you have to change the boilerplate:

use Cwd 'abs_path';
use File::Spec::Functions qw(catpath splitpath);
use local::lib catpath((splitpath(abs_path $0))[0, 1], '../../extlib');
use lib catpath((splitpath(abs_path $0))[0, 1], '../../lib');
use CE::Util::ScriptBootstrap;

And so on, the deeper you go.  Then what happens if you need to move the script to a different level?

The other problem here is more subtle, although I’m sure some of you saw it right away.  I’ll give a bizarre error message as a hint:

File::Spec version 3.4 required--this is only version 3.33 at /var/local/CE/bin/../extlib/lib/perl5/Path/Tiny.pm line 13.
BEGIN failed--compilation aborted at /var/local/CE/bin/../extlib/lib/perl5/Path/Tiny.pm line 13.

What makes that bizarre is that, if you look at the version of File::Spec in our extlib/, it really is 3.4.  So where did 3.33 come from?

Why, from the system Perl, of course.

Because, you see, we loaded File::Spec before we used local::lib to redirect where we were pulling modules from.5  Here was one suggestion to fix the problem:

#!/usr/bin/perl
 
use strict;
use warnings;
 
use FindBin;
use lib "$FindBin::Bin/../lib";
use local::lib "$FindBin::Bin/../extlib";
use lib "$FindBin::Bin/../lib";
use CE::Util::ScriptBootstrap;

There.  Pesky references to File::Spec all gone ... right?  Nope.  Not at all.  In fact, not only does FindBin6 use File::Spec, so does local::lib itself.7

And don’t forget that those “pragmas” at the top (i.e. strict and warnings) are just modules with a slightly different naming convention.  As I discovered to my chagrin when I attempted to add a use autodie to that list.

And on top of all that, we’re still using the system Perl.  Not the system Perl modules any more, but still good ol’ /usr/bin/perl, as the shebang line attests.

The shebang line ... the problem with that (as we mentioned above) is that it contains a hard-coded pathname, which may not always be where the system Perl is, depending on the system.  But it gets worse if you decide you don’t want to use the system Perl at all.  See, system Perls have a tendency to lag behind the rest of the world: until recently, it wasn’t that uncommon to have your system Perl turn out to be over a decade old.8  Lately, our beloved Perl has been getting some much needed improvements, catching up with those young whippersnappers that want to supplant us, but that does you no good if your system Perl is stuck in 2002.  Unless you totally get rid of your system Perl, of course.

Enter Perlbrew (and plenv, although I’m not as familiar with that one).  Now you can install whatever version of Perl strikes your fancy, and it probably won’t hose your system, even.  Of course that means you have to go back and change all your shebang lines.  But whatever shall you change them to?  Well, you can read all about the intricacies and debates on the Perlbrew web site, but the common concensus these days is this:

#! /usr/bin/env perl

which will fetch Perl from your $PATH—that is, theoretically you’ll get the same thing you’d get if you typed perl at the command line.  Again, presuming Unixoid systems, which likely encompasses recent versions of MacOS but not Windows.  Of course, there are versions of Unix where env doesn’t live in /usr/bin, but this works quite a lot of the time, and you’ll find it’s the most common incantation recommended these days in tutorials and blog posts.

Of course, it presumes that the right version of Perl is in the $PATH.

Now, let’s say I want to install Perl (via Perlbrew, or plenv, or what-have-you) into a directory in my corporate codebase.  Like our extlib/ above, but probably something more like perl5/.  I certainly can’t put an absolute path in my shebang lines for my scripts, so I’m probably going to want to go with something like the /usr/bin/env perl trick.  Except ... how do I get the proper directory into the environment?  In an ideal world, it’s the same directory on every machine, but I learned long ago that we don’t live in an ideal world.  What if we need multiple installations on a single machine (hinted at above)?  Then we certainly can’t use the same directory for everything.  There might be other reasons too (e.g. different environments, like staging or integration testing, might be more convenient if set up using different directories).  So something somewhere has to set that directory—I know damn well that if I hardcode it everywhere, I will definitely live to regret that.  Something has to put it in the path.  Us humans can set up the environments for ourselves, but what about the web servers that run as pseudo-users?  What about the cronjobs, which have little to no environments at all?  What about the scripts that have to install all the directories in the first place?

One of my coworkers polled a group of devops folks somewhere, and the concensus was, use a bash script which sets up your environment and passes arguments through to the real Perl.  Such a bash script would be invoked whenever you need to launch your scripts in a scenario where you can’t guarantee that the environment would already be there (cronjobs, sudo, running remotely via ssh, etc).  That is, instead of

bin/foo

or even

perl bin/foo

now you’re going to do

bin/launch bin/foo

instead.9  Okay, lovely.  But it means you have to go through all your crontab files, and all your scripts that log into other machines and run other scripts, etc etc, and modify how they’re all called.  So it got me thinking ... if we’re going to have to change all our shebang lines anyway, couldn’t we just do all this in one fell swoop?  I mean, couldn’t we write something like this:

#! /bin/bash
 
ROOTDIR=$(magic-directory-figurer-out-er)
PERLDIR=$ROOTDIR/perl5
PERL5LIB=$ROOTDIR/perl5/lib:$ROOTDIR/lib
 
exec $PERLDIR/bin/perl "$@"

and call it ... oh, I dunno, say /usr/local/bin/invoke, and then change all our shebang lines to:

#! /usr/local/bin/invoke

The more I pondered this, the more excited I got by the possibilities.  Not only does this give us one centralized location to fiddle with environment variables, but it also gives us a place to add Perl switches.  We could turn on taint checks for every script at once with a 3-character change to a single file, for instance.  Always load some module or other.  Change it temporarily to generate profiling info.  Whatever.  This could be cool!

If it works ...

So first I checked to see if it should work, reliably, across enough different flavors of Unix.  Meaning I Googled.  A lot.  And I found a page which told me that any reasonably modern version of Linux and bash should be able to handle this.  I even found several examples of people doing it.  Woohoo.  So, next, I tried it.

And it just hung.  Like, forever.

So, I added some debugging.  And tried again.  And promptly blew out my screen buffer.  So I tried again, a bit more cautiously:

[absalom:~] ./t.pl | head
+ echo 'args: ./t.pl'
+ exec /usr/bin/env perl ./t.pl
args: ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
args: ./t.pl
+ exec /usr/bin/env perl ./t.pl
+ echo 'args: ./t.pl'
+ exec /usr/bin/env perl ./t.pl
args: ./t.pl
+ echo 'args: ./t.pl'

What the hey??  It’s like my bash script is just invoking itself, over and over again, and nothing in my Perl script (which also had debugging in it) was getting run at all.  Hmmm ...

But then I remembered seeing this in the Wikipedia article about shebangs, and this in a random Stack Overflow question.  Both of them go to some trouble to skip the shebang line when reprocessing the script.  And, sure, neither one seems to have anything at all to do with the problem I’m trying to solve, but it’s worth a shot, right?

#! /bin/bash
 
ROOTDIR=$(magic-directory-figurer-out-er)
PERLDIR=$ROOTDIR/perl5
PERL5LIB=$ROOTDIR/perl5/lib:$ROOTDIR/lib
 
# have to skip the shebang line in the script we're running
# otherwise we get an infinite loop
script="$1" ; shift
exec $PERLDIR/bin/perl <(tail -n +2 "$script") "$@"

Boom.  That works.

So I told my coworkers.  Two of them immediately said I didn’t need to do all that futzing around with skipping the first line, and one of them additionally pointed out that I was going to throw off all the error messages in our scripts by one.  To which my responses were: 1) oh, yes, you really do need all that—trust me—and 2) oh, yeah ... good point.  Sigh.

So, while thinking about how to change my magical incantation so it preserved the line numbers, I attempted to explain to everyone why it was necessary to skip the first line.  Casting my mind back to a conversation from a few years ago, I remembered that Perl itself would pay attention to a shebang line.  That has to do with being able to pass multiple switches to the Perl executable in the shebang line.  In other words, the reason you can do this:

#! /usr/bin/perl -T -Ilib -w

or whatever10 probably has nothing to do with your version of Linux or bash, and only works because Perl re-executes itself with the proper switches after seeing the shebang line.  So it has to be reading the shebang line, and conditionally doing something with it.  So, maybe it has something to do with that, I threw out (totally grasping at straws).

My coworkers expressed doubts.  What about good old /usr/bin/env perl, they challenged?  Why does that one work then?

“Uhhhhh ...” was my brilliant reply.  Maybe there’s a specific exception for env?  I followed that piece of infallible hypothesis with this one: “or else it might just be happy that the word perl is in there somewhere.” Immediately one of my coworkers changed the name of his script from launch to launch-perl or somesuch and bang! it worked.

Whoa, I said.

You mean I was really right about that?  No way ...

Well, of course now I had to go find where this is documented.  As it turns out, it’s right there at the top of the perlrun man page:

If the #! line does not contain the word “perl” nor the word “indir” the program named after the #! is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don’t do #! , because they can tell a program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter for them.11


So, at the end of it all, I’m not sure if this post is an interesting detective story, or a tip for avoiding an infinite loop gotcha should you ever try what I did, or just a long, meandering answer to my sysadmin’s observation about Perl sure requiring a lot more scaffolding these days,12 but hopefully someone out there got something out of it.  I thought the whole thing was pretty interesting, at least.  And now I’ve shared it with you.


1 Okay, okay: not everything.  As our dear language ages, we are finding more and more things that there isn’t a CPAN module for yet.  But you’ll allow me a bit of hyperbole in service of a greater truth, I hope.


2 Not to pick on Ruby.  Python can be just as much of a pain in the ass, and Node is usually even more so.


3 And by “we” I mean “mst.”


4 Perl 5.004 was released in 1997, which makes it older than my oldest child.  That counts as ancient times as far as I’m concerned.


5 Please note that I do not take any credit for figuring this out; smarter people than I at $work explained this to me, and now I’m passing it on to you.


6 Which you really shouldn’t be using anyway.


7 Well, most versions of local::lib out there in the wild anyway.  To be fair, the newer versions are more circumspect about loading File::Spec—for this very reason.


8 Yes, Perl 5.8 is nearly 13 years old, and that was standard on many distros.  Probably still is, on some.


9 Let’s conveniently ignore the question of how we ended up in the current directory in the first place.  You can’t ignore it in the real world, of course, but we can ignore it here, for now.


10 Except don’t actually use perl -w.  It’s inferior to use warnings in many, many ways.


11 The “indir” bit is new as of Perl 5.16, although I can’t seem to find any explanation of what it’s there for.  I did find the commit which changed the man page, hoping that its log message would also contain some reference to its function.  But, no such luck.  If anyone knows the reason for it, toss it out in the comments: I’d love to hear about it.


12 Yeah, okay, it’s long and meandering either way: I know.  I know.


8 Comments

Hi Buddy,

Just to be devil's advocate, if something thinks things look complex maybe they are a bit? you never know, sometimes the outsider opinion has some meaning worth looking for...

In terms about scripts getting complex, I wonder if the script is so complex it needs a pile of stuff then y ou might want to use more modules or (as brian de foy's been calling them 'modulinos' which is a module that can get called in script context and work like a script. I've found that approach helps to make stuff make sense a bit. Specifiic comments

#! /usr/bin/env perl

Yeah do that, I found the shebang is not a place to hang wild and wait for the police to show up to kick you out of the club. Also installing a local perl (with perlbrew, plenv or Build::Perl (my favorite, no bells , just do "curl https://raw.githubusercontent.com/tokuhirom/Perl-Build/master/perl-build | perl - 5.16.2 /opt/perl-5.16/" and you are good to go.

Perl is not the only language that suggestsion separting runtime from development. I recall when I did Java back in 1994 you always had a JDK in addition to the java runtime (big pain at the time to take up so much drive space). Its just that for a while Perl devs were used to the idea of using system Perl... even though it hurt us.

use strict;
use warnings;

worth the two lines in my head, but again you can avoid them if all your real code is in a module

use Cwd 'abs_path';
use File::Spec::Functions qw(catpath splitpath);
use local::lib catpath((splitpath(abs_path $0))[0, 1], '../extlib');
use lib catpath((splitpath(abs_path $0))[0, 1], '../lib');

Personally I find this wonky. I really don't think its a good idea to have a script try to bootstrap its own environment. You are introducing a tight, structural dependency between the script and the filesystem. I would have just done (from the command line)

perl -Ilib -Iextlib/local/lib/perl5 $script

I know that seems like its just moving stuff around, but ultimately I think its right to say that the caller of the script is responsible for $ENV. that will save you trouble when moving things around or when you need to run them under alternative creds (like under cron).

use CE::Util::ScriptBootstrap;

Again if the scripts are complex enough that you need a helper to make sure new dev don't reinvent the wheel incorrectly, doing libs is probably better. Personally I always liked the idea that you only add exactly what you need, that you don't anticipate 'tomorrow I might need a database connection, so lets install all this extra stuff right now'. As long as what you do today isn't poorly designed it won't prevent you tomorrow from properly extending it to support evolving business requirements.

For scripts that need to stand alone consider fatpacker.

If you want to have a bash wrapper that you can use to invoke a script under a given local lib I already put something like that on CPAN about 5 years ago (see App::local::lib::helper, and I think steve put a copy of that in bin or something, but reading the original docs is worth it I think).

Making a makefile target can help as well (sometimes I do "make perl-llib @args" and that does the local lib and the application lib if people find that a good idea.

Ultimately I don't think Perl needs more setup time than other development environments (and most of it can be automated). Its just that community practices are uneven and that does result in confusion.

There are a couple of tricks I've accumulated for managing dependency issues that make Perl easy to deal with.

If you are using the system perl establish a consistent location for non-packaged CPAN modules. I use /opt/cpan. cpanminus has the -L switch to set the target location, but CPAN is dependent on @INC. Then use profile/bashrc or a launcher script to prepend that location to the PERL5LIB environment variable. Your custom modules always take precedence over system modules. It then also becomes easy for your sysadmins to redistribute the custom installation directory, because they can build it once for each platform and then use tar or build a custom package.

However, it is now easy to install your own Perl which is even better because you get full isolation and can move to new Perls when you want to, as well as being able to specify an alternate Perl when needed. PerlBrew and Perl-Build (which does the actual building for plenv) have made this pretty easy to do. All the modules I need are listed in a shellscript that feeds them to cpanminus that I run immediately after installing Perl. Gabor Szabo has made it ridiculously easy to install your own with http://dwimperl.com/. Just download the appropriate tarball, extract it in a good place like /opt/dwimperlversion, run the relocate script and then symlinklink the bin folder as /opt/perl/bin. Set your environment so /opt/perl/bin is at the front of the path. DWIMPerl comes prebuilt with a large selection of popular modules, and if you need to install or update more, you're installing them into custom Perl. Whether you built Perl or started with DWIM, redistribution is easy: either make a tarball or a custom package.

The last piece to make this work is for your code the shebang line is #!/opt/perl/bin/perl, Perl utilities package management installs will still specify the system Perl.

I use /opt because system package management uses the /usr space on most systems and I like the convention of using /opt for things that you or your sysadmin manage or install directly.

local::lib no longer loads File::Spec. The footnote mentions this, but is also incorrect. File::Spec isn't used on Windows or Mac OS X. It's only used on VMS and Mac OS Classic, where it seems unlikely the rest of local::lib will work anyway.

Doesn't carton solve this problems?

Leave a comment

About Buddy Burden

user-pic 8 years in California, 19 years in Perl, 28 years in computers, 48 years in bare feet.