March 2010 Archives

IPC::System::Simple: A success story

system seems like such a simple thing to use. You tell it which command to run, and it runs it. When it's done, you continue with your program, or so you think. I recently ran into a problem with this. I was creating a lot of zombies, but Dread Pirate Fenwick to the rescue (and this is not his first time to rescue me).

As part of my DPAN work, I'm indexing every distribution in BackPAN. I collect loads of data on every distribution so I can make a searchable catalog of it. When you want to use DPAN to create private CPAN repositories, you shouldn't have to re-index distributions where we already know the answer. I can provide most of the answers pre-cataloged so you focus on only the novel distributions in your repository. I have a tarball of sample results for 16,000 distributions.

My indexing has one feature that PAUSE doesn't: I run the distribution code. PAUSE does various things to guess distributions (and in rare cases guesses incorrectly). I run the build file and look in blib as well as use PPI to extract program elements.

Since I'm indexing every distribution, I have to deal with every goofy thing that someone has ever done in a Makefile.PL, all the way back to 1994. That includes things that prompt for information without using Makemaker's prompt function which knows how to deal with non-interactive installations:

# Makefile.PL

print "Tell me something> ";
<STDIN>

That's not a huge deal, or shouldn't be. As I index, I have a timeout value that I enforce with alarm:

# From MyCPAN::Indexer::Worker
local $SIG{ALRM} = sub { die "Alarm rang for $dist_basename!\n" };
alarm( $config->alarm || 15 );
$logger->debug( "Examining $dist_basename" );
my $info = eval { $Indexer->run( $dist ) };
$logger->debug( "Done examining $dist_basename" );
my $at = $@; chomp $at;
alarm 0;

Eventually in that run(), I'd run a system so I could create blib:

system( $^X, 'Makefile.PL' );

For some reason that I don't care to investigate fully, when the alarm would trigger for these prompting Makefile.PLs, I would leave behind a zombie process. That's not terribly bad, but I'm doing this for 170,000 distributions. I discovered, at least on FreeBSD 8, that there is a limit to the number of zombies I can have, and it's around 25 or so. That might be some sort of resource limit that ulimit isn't telling me about. Once I made another zombie, everything hung. That would happen about 15 minutes after I started the process and had already gone off to bed. Those overnight runs didn't get much done.

I did various things to try to reproduce this on the small scale, but eventually decided to stop trying to figure it out and start trying to solve it. So, instead of system, I switched to IPC::System::Simple. Paul Fenwick has done a lot of work to make things work properly, so I figured some of that would chop the heads off these zombies. I think it just might have. Since IPC::System::Simple has its own system, I really just needed to load the module so I could replace the built-in version.

use IPC::System::Simple qw(system);
system( $^X, 'Makefile.PL' );

Now everything works. At the moment I don't care how or why, as long as it's churning out indexing reports.

Tidy up your CPAN author directory, increase your Schwartz!

I used to encourage people to help CPAN Increase its Schwartz by making the ratio of the byte size of just the latest versions to all versions as high as possible. It's time to increase the schwartz again. Think of it as Spring cleaning (Autumn for you southern folks) for your CPAN directory.

Many people are arguing on the CPAN workers mailing list about who's opinion about rsyncing mirrors is the most worthy. While they huff and puff, you can help with just a couple minutes of your time.

If you have cruft in your CPAN directory that can safely live on BackPAN (which has everything ever uploaded), you can help keep the size of CPAN manageable by deleting that cruft. It's not so much the byte-size but the file size. In my CPAN mirror right now there is a bit over 120,000 files, so even if you think that your puny module going away won't do much good, its deletion reduces the file count as much as BioPerl or Number::Phone deleting one of theirs.

Simply log into your PAUSE account, follow the delete files link, and schedule some files for deletion. They go away after three days. You have the option to change your mind too. The files only disappear from PAUSE: they'll still be on BackPAN so you'll always have them.

Manage multiple MiniCPANs, and version them

Most of my work with DPAN revolves around the creation of private, CPAN-like repositories that a project team can use without affecting anyone else. Setting up a DPAN process for a recent customer involved making several MiniCPANs, one for each project group. I had to add a couple of features to CPAN::Mini to make it work out. These new features show up in CPAN::Mini 1.100590.

Like most CPAN tools, CPAN::Mini assumed that there would only ever be one repository, so a person tasked with maintaining several had a problem. Consider this workflow to support several groups:

  • There's a master MiniCPAN that holds all of the modules anyone in the company is allowed to use. The remote is a real CPAN.
  • Slave MiniCPANs pull from and filter the modules in the master to contain just the modules their application needs. The remote is the master MiniCPAN.
  • DPAN (or CPAN::Mini::Inject) adds project specific modules to the slaves.

I can subclass CPAN::Mini by specifying an implementing class, but that's the same problem I had with Making Module::Starrter Easier to Subclass: I don't get to insert my subclass until after the configuration processing has completed.

A single sysadmin maintaining all of this can now use the -C switch to minicpan to specify configuration files:

 % minicpan -C master.conf
 % minicpan -C projectA.conf
 % minicpan -C projectB.conf -c Local::CPAN::Mini::Filter::ProjectB

All of that can even be in a shell script so that everything updates in sequence.

That's all very nice, and is going to be very useful when new configuration options show up.

Along with that, I wanted to keep each of these MiniCPANs under source control. A developer could rollback to any version of their MiniCPAN so they could bisect it to find which module upgrades broke a particular feature. They could also checkout a particular version of their MiniCPAN but a source control tag. This also gives developers the chance to branch their MiniCPAN for experimental work while not forcing it on their workmates (although they can also make a slave MiniCPAN of their slave MiniCPAN), profile their applications against different sets of module versions, and so on.

That didn't work out so well previously because minicpan likes to remove files and directories, such as .svn, that aren't in the real CPAN. I've adjusted that too with a new configuration option:

 remote: http://www.example.com/CPAN/
 local: /CPAN
 ignore_source_control: 1

This is different than the existing option to skip_cleanup, which removes the old versions and so on. You can still have the cleanup, but the cleanup ignores files that look like they belong to source control. I have stuff in there for CVS, SVN, and Git, so if you need more, just make the patch. You can get the CPAN::Mini sources from Github but you can also edit files directly in Github so you don't have to do a lot of work for very minor changes like adding a filename to an array.

Now I need to go back and look at CPAN::Mini::Webserver to make it work with multiple MiniCPANs too. That is, it should be able to start off multiple instances as long as it knows which configuration file to use (which should be able to specify the port).

Mastering Perl for the Kindle iPhone app doesn't suck (too much).

Not thinking that any Perl books would be available for Kindle, I searched anyway. I can get at least Learning Perl and Mastering Perl from the Kindle store. I'm thinking about this because I'd like to see what I can do with an iPad Kindle app.

I'm on a trip with severe weight requirements for luggage, so I've been trying the Kindle app for the iPhone. It seemed weird at first, but after a couple chapters of the first book, I don't mind it anymore. The tech books are a bit different. If you're used to the beautiful and understated typography that is one of O'Reilly's hallmarks, you're going to think the Kindle is a bit odd. I think I have it slightly nicer on the iPhone, but it's still different.

The version I have uses multiple font faces, so you can get the monospaced font for inline code and code sections. The iPhone code wrapping is odd, and the paragraph spacing is set up for novels: that is, there's no extra space between the start of a code section and a body section. I don't think I'd want to read a tech book for the first time this way, but it's a nice reference.

Here a screenshot from the portrait orientation. The code is severely wrapped, but the text looks decent :

IMG_0572

And one from the landscape orientation:

IMG_0578

Now I just need to figure out how to map something I see in the Kindle app to the physical page number so I can respond to errata (which are reported by page numbers instead of chapter and section). I guess I could develop that index myself, and I've often thought of doing that, but I'm hoping it comes as a feature to Kindle.

I should note that the books aren't any cheaper from the Kindle store. The Kindle is not about saving money: it's about convenience. If you want to save money, you can get the very cheap PDF versions of books from O'Reilly. Those should look just like the physical book, although they are probably harder to read on the iPhone.

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).