The CPAN Air Force, 2011 QA Hackathon Day 2

What if a CPAN mirror wasn't stationary? How would we track it? Would people make Google Earth maps to show its path? Could we dynamically adjust capacity without additional servers? Would we have to get FAA approval?

Goodyear Blimp

Ricardo and I started talking about CPAN mirror data, mostly because Ricardo was relaying messages to me from Adam Kennedy on IRC. That turned into a discussion of what work we could give Adam as part of this hackathon, and since he's not here. Steffan Müller quickly wanted in the fun of porposing work we could heap onto Adam.

Adam's Mirror::JSON module presents a data structure for tracking mirrors, including their freshness and responsiveness. We starting talking about adding geography to that, but why stop there? What if the mirror wasn't geographically static? Although we came up with several fanciful ways this might be true, it is very easy to run a CPAN from an iPhone hanging off of a toy blimp. From that, we started to imagine a fleet of CPANs that would autonomously move to the locations where they were most needed, probably following the business hours cycles, constantly advancing across the time zones. If you were sailing to Rapa Nui, during business hours you might have a CPAN blimp nearby, completely solar powered and up-to-date with the new fast rsync. How would we track that in a mirrors.json? Would mirrors file a flight plan, or would we track them in real time with GPS?

That's probably not going to happen (I'd like to produce a Y Prize for this, not like the Y Prize that takes it's name from Y Combinator or the next letter after X, but for Why? Prize), but imagine that your mirrors weren't fixed and you wanted to discover which real or Mini CPANs were on your network. We can make a zero-conf daemon thingy that the CPAN clients can automatically use. That means we need a client that knows what to do with that.

From there, we have the problem of knowing which CPAN we actually want to use. That is, they can be in different states and have various things added or removed. Some mirrors might be purposefully stuck at older versions of some things. Additionally, some auto-discovered mirrors might be interlopers that you should avoid. This should be easy enough to handle by having the source of the mirror sign the packages file (or something). CPAN clients would have a list of public keys that they trust. Those might be from public servers or from private servers. We can verify any mirror before we decide to download from it. It isn't a full-service, unspoofable system, but we don't check anything with the current system. Security isn't the main concern: we just wanted to identify the sources of the mirrors. Is it from Andreas, CPAN.org, or something else? Of all the MiniCPANs advertising themselves inside your internal network, which one is right for your project?

That brought up Module::Signature, the failed experiment to verify distributions. Its problem was one of position, running from inside a distribution and as part of the build process. Instead, I'd want to expand the packages file to have at least another column with a distribution signature. If we sign the packages files, we can trust the signatures it contains without having to sign all of the CHECKSUMS files too (although that might not be a bad idea, data size wise). All of this would be checked by the CPAN client rather than the build process. Although there might be a way for distribution uploaders to sign their distributions (I'd like to upload by SSH, which auto-signs it with my key), it wouldn't be inside the distribution itself.

What would need to actually implement this?

  • The CPAN source publishes a public key
  • A cpan zero-conf daemon
  • Client support for either of those

Now I just need the time to do it.

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).