Startup overhead still matters

We all love Moose, and the subject of this question could have been rephrased better, but why do I get the feeling that not many people write pure CGI or command-line scripts in Perl (that got executed many times) anymore? After all, didn't Perl begin as a tool for sysadmin and only in the mid 1990's got picked up as the darling of CGI/web programming?

There are still many cases where/reasons why Perl scripts need to be run many times (instead of persistently long running).

  • It's much more stable (I've often need to kill or ulimit or periodically restart a Perl process because after days it grows to 500+ MB).

  • Sometimes CGI is all you get (especially in shared hosting environment, which is related to point 1).

  • Sometimes you need to run the scripts for many users, and it's not feasible (e.g. memory-wise) to let them all run persistently.

  • Many old scripts are designed that way.

  • Some environments require them that way (e.g. scripts run in .qmail are run for every incoming mail, scripts run by tcpserver are started for every incoming connection, etc).

There used to be projects like PersistentPerl or SpeedyPerl to let us easily make a Perl script persistent by just changing the shebang line (e.g. from #!perl to #!pperl), but these projects are currently not actively developed, probably due to lack of demand (?), or becase this kind of deployment tends to cause subtle bugs (I did get bitten by this a couple of times in the past). You can't just convert a script that is designed/written to be a one-off run into a long-running one without expecting some bugs, anyway.

And the Perl compiler (B::*, *.pmc) is also now deprecated, probably because it does not give that many startup cost saving after all (the fact that Perl has phasers like BEGIN/CHECK blocks means it has to execute code as it compiles them anyway).

And thus we're stuck with having to accept the startup cost of parsing & compiling for every script run. That's why startup cost matters. On our servers awstats runs many thousand of times everyday (2000-5000 sites x 10+ HTML pages), and since it's a giant script (10k-ish line) it has a startup overhead of almost 1s. I really would like to shave this startup overhead as it is a significant part of server load.

Until today many of my scripts/programs are still deployed as one-off command line scripts. And that's why instead of Moose I use Mouse (or Any::Moose, to be exact) whenever I can. And so far I can.

5 Comments

Perl compiler was deprecated only in core, because p5p group didn't wanted to maintain it. Now B::C is available on CPAN.

The perl compiler (B::C, C::CC, Byteloader) is not deprecated at all! It is better than ever, and esp. designed for such applications.

Faster startup-time with C -O2 and ptmalloc3,
faster run-time with CC,
faster destruction-time to-be-done (in about a month) with a new -f switch.

http://conferences.yapceurope.org/ye2010/talk/2946
http://search.cpan.org/dist/B-C/ or http://www.perl-compiler.org/

I'll just try to compile my patched awstats.pl, because I'm also running it hourly.

We don't all love Moose :)

An update: several months ago I tried compiling awstats. It failed miserably. Let's hope that someday it can be compiled without problem.

Leave a comment

About Steven Haryanto

user-pic A programmer (mostly Perl 5 nowadays).