The coming bloated Perl apps?
A few weeks ago, I got annoyed by the fact that one of our command line applications was getting slower and slower to start up (the delay was getting more and more noticable), so I thought I'd do some refactoring, e.g. split large files into smaller ones and delay loading modules until necessary.
Sure enough, one of the main causes of the slow start up was preloading too many modules. Over the years I had been blindly sticking 'use' statements into our kitchen sink $Proj::Utils module, which was used by almost all scripts in the project. Loading $Proj::Utils alone pulled in over 60k lines from around 150 files!
After I split things up, it became clearer which modules are particularly heavy. This one stood out:
% time perl -MFile::ChangeNotify -e1
real 0m0.972s
% perl -MDevel::EndStats -e1
# Total number of module files loaded: 129
# Total number of modules lines loaded: 46385
So almost 130 files and a total of 45k+ lines just from loading File::ChangeNotify alone. 130 files just for a filesystem monitoring routine! Who would've thought that a filesystem monitor needs so many lines of program? Compare with, say, a recent HTTP client:
% perl -MHTTP::Tiny -e1
# Total number of module files loaded: 18
# Total number of modules lines loaded: 6089
I quickly switched to Linux::Inotify2 and things are much better now (but I might have to revisit this since we want to give the new Debian/kFreeBSD a Squeeze).
As I suspected (since the module is written by Dave Rolsky also), File::ChangeNotify utilizes Moose, which is not particularly lightweight either:
% time perl -MMoose -e1
real 0m0.712s
% perl -MDevel::EndStats -MMoose -e1
# Total number of module files loaded: 100
# Total number of modules lines loaded: 35760
Compare with:
% time perl -MMouse -e1
real 0m0.089s
% perl -MDevel::EndStats -MMouse -e1
# Total number of module files loaded: 20
# Total number of modules lines loaded: 6675
Come to think of it, running Dist::Zilla is also quite painfully slow these days. Just running "dzil foo" pulled in around 60k lines and took 1.7s! Of course, dzil is Moose-based.
While it is a good thing that Moose is getting more popular, it's a bit shameful to see that Ruby and Python scripts "get OO for free" while Moose scripts have to endure a 0.7s startup penalty. Mouse, Moo, Role::Basic come to the rescue but I wonder what would Ruby/Python programmers think (you have how many object systems?? Why do you people can never agree on one thing and TIMTOWTDI everything?)
Disclaimer: Number of lines includes all blanks/comment/POD/DATA/etc from all files loaded in %INC, actual SLOC is probably significantly less. Timing is done on a puny HP Mininote netbook (Atom N450 1.66GHz) which I'm currently stuck with in the past few weeks. With all due respects to all authors of modules mentioned. They all write fantastic, working code.
It's a valid point, but Moose and similar heavyweight OO systems are not geared towards quick command-line response time. And you found out yourself that there was a quick alternative for the file monitoring task.
So, yes, if you're using these commands very frequently, you may want to find a way to do it (there's more than one, you know :) that does not involve a dependency on Moose.
But for persistent or less-frequently-run applications, what's the real harm? How often do you actually run dzil for that 1.7s to be worth noting?
In A Timely Start, Jean-Louis Leroy covers some of the problems he had with this. Most of his problem was the huge disadvantage of searching through @INC to find the modules, especially if you load @INC with many directories from PERL5LIB.
Thanks for the shout out to HTTP::Tiny, but I want to point out that HTTP::Tiny itself is only 1076 lines of the 6000 or so (using Devel::EndStats). A better comparison of a "baseline" load for doing just about anything useful on the command line is looking at something like File::Spec (6 modules, 1682 lines), IO::File (12 modules, 3674 lines) or Getopt::Long (9 modules, 6237 lines).
-- dagolden
@Adam:
...
But for persistent or less-frequently-run applications, what's the real harm? How often do you actually run dzil for that 1.7s to be worth noting?
Well, the problem is, Moose is often not marketed as "OO system for persistent or less-frequently-run applications", but often as "*the* OO system for Perl".
Hah. A mere 0.9 seconds of load-time overhead? The luxury :P
I have a packet analysis program that pulls in lots of plugins, using Module::Pluggable. Here's it running on my Soekris router:
All that overhead is Module::Pluggable doing one @INC search for its modules, then 'require' doing the very same @INC search again for every module that gets loaded. It walks @INC about 20 times in total.
Interesting. Fortunately so far I haven't had the need to put a lot of stuffs in @INC, just one or two entries per project.
Bemoaning the fact that Moose doesn't fit every single possible use of Perl is, to me, a non-criticism. Perl is not a one-size-fits-all language. Moose is marketed as "A* postmodern object system for Perl", and its stated goals are making "Object Oriented programming easier, more consistent and less tedious." Many people choose to use it everywhere, but nobody will tell you to use it if it doesn't work for you.
I don't blame you for wanting to have your cake and eat it too, what kind of programmers would we be if we didn't want to improve things? But what, exactly, are you hoping to accomplish by pointing out the well-known fact that Moose incurs a large compile-time and moderate run-time penalty? Are you trying to say that Moose should be marketed differently? Or are you saying that it is implemented incorrectly? Or is there an option I am missing?
No, I think he is trying to say that having Moose as *THE* OOP systems used all over the place, means that you will get a loading hit for even trivial things because other modules use it and that if some OOP mechanism was already in the core, the penalty would have been less.
@Adam: As the post title might suggest, I was pointing out a qualm that since Moose is getting popular, more people *will* simply use it in their projects (and CPAN libraries as well), even if some of them didn't evaluate the cost, because perhaps it is not important to them.
However, you won't fully know how your libraries will be used. Some of these Moose-using libraries might end up being used in more performance-critical situation.
Hear-hear! A moose-free dependency-chain is sadly becoming a valuable feature for those not in the web-app echo-chamber :(
Regarding File::ChangeNotify, I *did* know the cost of using Moose for this library. F::CN was written to replace code in the Catalyst-Devel distro. In that distro, it's part of the system that watched the development directory and restarts the dev server when code changes.
Since Catalyst was already using Moose, using it for F::CN was an obvious choice. Using Moose made writing this module much simpler, and probably less buggy.
If you have a problem with the code I wrote (for free), shared (for free), and now maintain (for free), then you have lots of choices, including writing your own code, forking, or simply not using it.
This is all about choosing the right tool for the right job. If Moose's startup costs are too much for a given task, it's the wrong tool. We all know this, of course....
But maybe this is a case for the CPAN (or more aptly, the metapan project?) to include metadata about any large, popular frameworks that a distribution requires.
That, paired with a useful search interface that lets you say "only show non-/Moose|other framework|other meta data/" modules, should make it even easier for us to effectively choose the right tools for the job at hand.
In the meantime though, it's not very difficult to check a distribution to see if it uses Moose or any other framework that's undesirable for your given task.
Thanks for all the comments. Most seem to miss my point, this indicates that I didn't do a good job expressing it in the first place.
I know Moose is not suitable in all cases, and I'm still using Moose where it fits.
I know that there are alternatives to Moose for when Moose doesn't fit, and I'm using some of them.
I know how to check if a distribution uses Moose/not.
It'd be nice to have a requirement filter on search.cpan.org or metacpan, but it's not absolutely necessary.
The problem is: there are distributions that I would like to use someday, but I can't/won't (for some projects), because they depend on Moose. Some of them probably shouldn't/need not to. But because Moose is getting popular, there will be more and more of such distributions.
Here's a rather lame analogy: for some people junk food is okay in some cases, but for some other it is especially bad, perhaps because their job requires them to be especially fit/lean, or because they are predisposed to obesity. But because fast food restaurants are getting more and more popular, perhaps because it does offer real benefits in terms of convenience or taste, people are eating more of them regardless.
Note that Moose is actually far from junk :)
Heh. The Autarch has spoken: "use Moose because f*** you. Your preferences don't matter, so suck it."
@educated_foo: That's a pretty ridiculous straw man. I'm not telling anyone to use Moose. I'm saying that I will choose to use it or not based mostly on whether it makes my life easier, not Steven's or anyone else's.
I understand that my choice may make my software unsuitable for some uses, but that's the nature of free software. Sometimes it does what you want, sometimes it doesn't.
If you want something different, there are a variety of options (patches may or may not be welcome, you can write it yourself, or pay for a custom version).