Perl vs Shell Scripts

Last week, I posted on my Other Blog about how I still prefer to use tcsh for my interactive shell.  Of course, I maintained that bash was the only real choice for shell scripts.

But then this brings us to another interesting point.  I, of course, am a Perl programmer.  A choice between Perl and shell scripts is not like a choice between C++ programs and shell scripts.  Back when I was a C++ programmer, there was no question but that some tasks should be done in bash (or, actually, I was using ksh back in those days).  But Perl is quite different: it’s not only the case that Perl can do anything that bash can do (that’s true of C++ as well), but Perl can also do it just as easily.  Perl is often considered a scripting language, and, while we could argue about whether that’s true or not due to the fact that Perl is compiled while bash is not, we can’t (and shouldn’t) argue that deploying a Perl program is as easy as deploying a shell script, and that’s part of what being a “scripting language” entails.

But, in the end, I still, sometimes, choose to use bash over Perl, for certain tasks.  I suppose you could argue there’s a certain amount of inertia involved: I got used to doing certain types of things with shell scripts back when my only other (viable) option was C++, or maybe awk.  But the comparison to awk is quite appropriate.  Before I learned Perl, I used awk a lot.  Nowadays ... hardly ever.  I certainly would never use awk inside a shell script: it would always be Perl there.  At the command line, I still occasionally type awk when I mean perl, but it happens less and less often, and more and more I find myself just giving up on awk before I even get to the end of the line.  Perl just really completely replaces awk.

But not bash.

Recently I was doing a personal scripting task (it involved fiddling around with MP3s, if you’re curious).  I started out doing it in bash, and then ended up ripping it apart about halfway through and starting over in Perl.  I had just made a bad decision on that particular task.  But, while I was cursing myself out for not just using Perl in the first place, it occurred to me that maybe I ought to try to articulate the places where bash really is (or might be) better.  If I had a checklist, maybe I could more easily identify where to put my efforts in from the get-go.  If I had a checklist, and I posted it here on this blog, maybe even you other Perlites would come along and tell me why I’m wrong, and maybe I’ll learn something. ;->

For the impatient, the executive precis is this: I generally write bash scripts for tasks which are essentially job control scripts.  Yes, Perl can call external programs just as well as any shell script can, but there are a few things bash gives us which Perl doesn’t.  This is not surprising, really: bash (as ksh before it, and the venerable sh before that) was basically invented for doing job control.  What sh lacked in that department, csh filled in, and then ksh and bash backported.  Perl has other foci.  Personally, I’m okay with Perl not being the answer for every job.

So, let’s take a look at the (few) places where bash beats Perl:

Job Failure

If I want to run a command in bash, I simply do it, like so:

run some command
In Perl, I’d have to do it more like so:
system("run some command");
It’s a bit more typing, sure, but that’s not the real problem.  The real problem is that, if the bash version has a problem—command not found, not enough memory, process table full—it stops and throws an error.  The Perl version just blithely keeps going.  Now, these days the situation is better than it used to be, because I can do this:
use autodie qw< :all >;
system("run some command");
And that works as well as the bash version.  Except, what if I care whether the command succeeded or not?  Here’s the bash version:
if ! run some command
then
    some recovery command
fi
In Perl, perhaps the best we can do is this:
use autodie qw< :all >;
use Try::Tiny;                  # TryCatch is nicer, but more overhead

try
{
    system("run some command");
}
catch
{
    system("some recovery command");
};                              # do NOT forget this semi-colon!
That’s a lot more typing, and probably not as clear either.  And clarity is maintainability, as we know.


Commands on Exit

In Perl, if you want to run something when your program exits, no matter where it’s exiting from, you can do this:

END
{
    system("run some command");
}
The equivalent in bash looks like this:
trap "run some command" EXIT
Except the bash command actually does what you want.  That command in the bash script gets run no matter what.  Normal exit, error exit, explosion, Ctrl-C, core dump ... unless someone does a kill -9 on you, you’re pretty much guaranteed to get your command run.  Not so in Perl.  In fact, the perlmod man page has this to say on the topic:
(But not if it’s morphing into another program via “exec”, or being blown out of the water by a signal—you have to trap that yourself (if you can).)
Lame.


Now, admittedly, the vast majority of times I use trap in this fashion, it looks like this:

trap "/bin/rm -f $tmpfile" EXIT
and, if that were Perl, I’d be using File::Temp, and then I wouldn’t have to worry about removing my tempfile.  But, then, I don’t think File::Temp handles signals either.  Overall, trap is a much easier way to deal with signals than Perl’s %SIG hash, although I have to admit that I’ve never written a trap statement that didn’t end with EXIT.


Processing Job Output Lines

If I know my output lines won’t have any spaces in them, I’m golden:

for line in $(run some command)
do
    process each "$line"
done
That’s a good bit simpler than the equivalent Perl:
use autodie qw< :all >;

open(PIPE, "run some command|");
while ( <PIPE> )
{
    chomp;
    system(qw< process each >, $_);
}
close(PIPE);
The only problem I have in bash is if my lines might have spaces.  That complicates the shell script version to where it’s not particularly better than the Perl:
OIFS="$IFS"
IFS="
"
for line in $(run some command)
do
    process each "$line"
done
IFS="$OIFS"
Still, the simple case is often sufficient.


Here Documents

Sure, Perl has “here documents.” But they’re different.  In Perl, a here doc defines a string.  In shell scripts, it defines STDIN.  So, in bash, I could say:

mysql <<END                       # assume ~/.my.cnf is set up
    select count(*) from some_table;
END
whereas in Perl, it would be:
use autodie qw< :all >;

open(PIPE, "| mysql");
print PIPE <<END;
    select count(*) from some_table;
END
close(PIPE);
Of course, for this particular example, I could just use DBI instead, but I generally find that to be more of a PITA than I want to deal with for a quick script.


File Equivalencies

I have no idea why Perl doesn’t have something like this.  Here’s some bash I’ve needed on several occasions:

if [[ ! $(dirname $0) -ef $(pwd) ]]
then
    echo "must run this from its home dir" >&2
    exit 1
fi
Until recently, this was stupidly difficult to replicate.  The Cwd module includes a realpath function, but its original implementation only worked on directories (leading to a number of subs in my Perl code named really_realpath).  Finally that was fixed, making it easier.  Nowadays, I’d probably use Path::Class to do this in Perl:
use Path::Class;

if (file($0)->dir->absolute->resolve ne dir()->absolute->resolve)
{
    die("must run this from its home dir");
}
which ... well, actually, now that I look at it, isn’t so bad, although awfully verbose.  But the bash version reads a lot more cleanly.


File Timestamp Comparisons

This one doesn’t come up that often, but, still.  In bash I can do:

if [[ $last_run -ot $touchfile ]]
then
    do it again
    touch $last_run
fi
In Perl, I’d have to do the stat calls myself, and pluck out the mtime from the array, which I always have to look up which element it is ... moderately irksome.  The bash version is just cleaner.


Tilde Expansion

I know, I know ... it’s just a convenience.  But it’s so very ... well, convenient.  In bash:

rcfile=~/.me.rc
In Perl, there’s File::HomeDir, which once-upon-a-time had the vaguely nifty $~, but they went and deprecated it.  Yeah, I’m sure it was a perfectly awful idea for multiple reasons.  But it was a lot more convenient than:
use File::HomeDir;

my $rcfile = File::HomeDir->my_home . "/.me.rc";
And that’s without even going all Path::Class on it, for portability (not that I’m likely to care much about having most of my personal job control scripts run on Windows or whatnot).  Yet another minor place where Perl just gives me more to type without significantly increasing any functionality I might actually use.


Now don’t get me wrong: Perl still beats the crap out of bash for most applications.  Reasons I might prefer Perl include (but are not limited to):

  • It’s going to be faster.  Mainly because I don’t actually have to start new processes for many of the things I want to do (basename and dirname being the most obvious examples, but generally cut, grep, sort, and wc can all be eliminated as well).
  • String handling in bash is rudimentary at best, and the whole $IFS thing is super-clunky.
  • Conditionals in shell scripts can be wonky.
  • Quoting in shell scripts can be a nightmare.
  • bash’s case statement leaves a lot to be desired beyond simple cases (NPI).
  • Arrays in bash suck.  Hashes in bash (assuming your bash is new enough to have them at all) suck even harder.
  • Once processing files or command output goes beyond the simple case I listed above, Perl starts really smoking bash.
  • CPAN.

So it’s not like bash is going to take over for Perl any time soon.  But I still find, after all these years, that many times a simple shell script can sometimes be simpler than a simple Perl script.  As I say, I welcome all attempts to convince me otherwise.  But, then again, there’s nothing wrong with having a few different tools in your toolbox.

12 Comments

I too find myself using bash for most of my scripting needs and have wondered too if it's mostly inertia.

Part of my inertia is my $USRLIB/common.sh which is a small function library that makes my shell scripts look like lots of this:
declare -r kindleDrive="/Volumes/$device_name"
insist -a "$kindleDrive" : "$device_name is not mounted."
insist -d "$kindleDrive" : "'$kindleDrive' is not a disk Volume!"
  
declare -r target="$kindleDrive/kindle"
insist -a "$target" : "Target '$target' does not exist."
insist -d "$target" : "Target '$target' is not a directory."
insist -w "$target" : "Target '$target' is not writable."
where insist runs the test, and if the test fails prints the message following the colon and exits with a non-zero error code.

The other way scripts happen for me is when I start off with something on the command line, possibly add them to my .bashrc, and over time they grow, possibly ending up as a script in my bin directory. newest started off that way. All it does is report the newest 10 files in the current directory. Unless you include a different number or give a different directory or filespec. It started out as a simple alias to an ls awk tail pipe which has become an 85 line function in .bashrc. It seems as fast as ls so there's no need to rewrite it.

Perl code tends to happen when I'm actually planning it. And I almost always use perl and DBI for database access that isn't obviously one-off viewing. But I think that's inertia on my part too.

My ~/bin directory breakdown looks like this
for f in *; do if [[ $f =~ \. ]]; then echo "${f##*.}"; fi; done | sort | uniq -c
   3 osa
   5 pl
  18 sh

Your other blog wouldn't let me comment, but I would have written:

-----
Or you could use zsh and get the best of both worlds. Or even better. It might not be worth it for you if you are happy with what you have, and you don't miss what you never had, but I don't regret switching to zsh 20-odd years ago.
-----

But while I'm here, for File Timestamp Comparisons you might like to look at the -M, -A and -C tests (perldoc -f -x).

And for Tilde Expansion, rcfile=~/.me.rc --> my $rcfile = <~/.me.rc>

I prefer to use Perl for my scripting needs but if I have to call a lot of external programs, then I use bash(1).

I tend to use the shell for manipulation at the file system level: moving files around, deleting, creating directions, changing permissions, etc. Anything beyond that, like dealing with the data inside the files, I'll almost always go to Perl. Or to put it another way: if it needs a regex or arithmetic or more than a single variable, I'll use Perl.

I am the new maintainer of Zoidberg, and I wonder if you might want to take it out for a spin. No I don't think its going to replace bash, but you might be interested in a Perl shell just as another option. http://p3rl.org/Zoidberg

The shell has one other killer feature that Perl lacks (even on the CPAN!)... Process Substitution.

But I'm working on that right now...

I also use Perl when I need to do anything more than the simplest command-line parsing. Getopt::Long just makes it too easy to give up.

You can always use parseopt from git for shell... if it were a separate library.

Awesome write up and comparison. Now I am clear to what extent I need to put my efforts in learning perl, and when I should attemp writing a script in perl.

Thanks Buddy.

~Dinesh

About "Processing Job Output Lines":

run some command | while read line
do
    process each "$line"
done

Leave a comment

About Buddy Burden

user-pic 7 years in California, 17 years in Perl, 26 years in computers, 46 years in bare feet.