The Four Major Problems with CPAN
First of all, I love CPAN. CPAN was the first of its kind, to provide an extensive and official library of modules that support the language itself. CPAN is very much a part of Perl as much as regular expressions are. Without CPAN, Perl would never be as versatile or useful as it exists now.
But, CPAN is also very old. It's been around since late 1995, almost 20 years now. As such, it has grown to house many thousands of modules, 119,124 to be exact (as of today). Many would consider that virtue, as you can find almost anything you want on CPAN. However, it betrays a underlying problem with CPAN that only increases with age:
A hundred thousand modules is too much stuff to sift through.
Look at a Cathedral-based setup like .NET. The .NET Framework provides a LOT of utilities and libraries for a great many things that you want to do, and it can do this in a 50MB file. Good luck trying to compose a similar library like that for CPAN. You could try, but somewhere down the road, you're going to end up with many decision points for which module to use for protocol or task X.
For example, let's look at something basic and simple like opening a IP socket connection. A search on MetaCPAN reveals:
Others that aren't immediately obvious are:
Errr, I just want to open a TCP connection somewhere. Oh, and since ARIN is telling everybody that we're almost out of IPv4 addresses, I'd like to have something that is IPv6 compatible. Which one should I use?
- IO::Socket is the old standby, but it's not the right module for Internet sockets.
- IO::Socket::INET is the right one for IP sockets, but it's not IPv6 compatible.
- Socket looks like it could be useful, but it's not OO and way too low level for most people's needs.
- Socket6... shouldn't this be retired in favor of Socket?
- IO::Socket::INET6 is OO, functions like IO::Socket, and is compatible with IPv6. But, it's the "wrong" module. Why? Well, it's been refactored into another module with a different author.
- IO::Socket::IP is the "right" module. Somehow, you're supposed to just know this.
Most of these modules are not even packaged in the same distro. What was wrong with adding IPv6 support to IO::Socket::INET? The answer is likely pretty complicated, but some of that comes with the Bazaar-based repository model. Different people making different modules in different distros using different coding styles and different environments. If this was a Cathedral like Microsoft, the answer would be "WTF?! Put all of this s**t into one library!"
Overall, though, I like the Bazaar model, because you can get many results much faster than a Cathedral. For example, the .NET framework doesn't have anything for many protocol specific items, like SNMP or Telnet. (You typically have to pay for these from a third-party.) But, just because it's a Bazaar model doesn't mean that the weaknesses of that model should always exist, unable to be fixed or mitigated.
Thus, I'd like to identify what I consider to be the four major (specific) problems with CPAN and some potential solutions to these problems:
1. Too many modules are unmaintained; abandoned but not marked as such.
Perl veterans know this problem all too well, and I touched on the issue in the example above. Distros have owners, typically a single owner, and those owners sometimes move on to other things. Or they don't have the tuits (the round ones) to maintain the distro. Or maybe they completely switched languages and aren't interested in Perl any more.
However, the users don't know this, at least not officially. They continue to submit bugs for distros that haven't had releases in years in an RT tracking system with tickets that are 7+ years old. Yes, these are warning signs, but there's nothing official saying that it's "unmaintained" or "abandoned".
There is a way to take over a distro, but it's a process that takes months to complete. Furthermore, doing that requires a certain level of commitment to say "Yes, I want to own this distro and take full responsibility for its bugs and issues". Many people don't want to go that far. They just want their patch implemented, so that it JFW.
My Solution: Create an "official" orphanage (tied to GitHub) with an automated abandoned status checker. I have a more detailed plan, but it's too large to include here. I will discuss this in my next blog post.
2. There is not enough data on what modules are mature; which ones are the "right ones" to use.
Again, IO::Socket::INET6 vs. IO::Socket::IP is one good example. Others are:
- Mouse vs. Moo (vs. Moose vs. Mo)
- File::Spec vs. Path::Class
- @ISA vs. base vs. parent (hint: It's base, but the docs won't tell you that.)
- Email::Send vs. Email::Simple vs. Email::Sender vs. Email::Abstract vs. Email::*
- Devel::Leak vs. Devel::LeakTrace vs. Devel::Gladiator
Unless you ask the right people, or go the hard route and try several of them, you're going to have problems figuring out exactly which module you should use.
My Solution: Work on better scoring of module relevancy, maturity, etc. Search engines like MetaCPAN could be leveraged to give you more accurate and relevant results, based on a number of pieces of information from the distro. I have an initial set of scoring methods in the planning phase right now.
3. Many modules are only used for semi-private needs.
These categories would include:
- Testing (Acme::Prereqs)
- Personalized use (Task::*, DZIL Author bundles)
- Training (search for "The great new")
- Are deprecated (search for "deprecated")
It's a lot of cruft that could be buried (or potentially deleted), and it clutters up search results.
My Solution: Add a "distro_type" variable to CPAN::Meta::Spec, which would then be used by search engines like MetaCPAN. The 'keywords' item could be used as a stopgap, but a more official status variable should be implemented in the long run.
4. Modules cannot be renamed or deleted, even with a long-term deprecation process.
Names in CPAN are sort like domain names with two critical exceptions: it doesn't cost anything to take a name and they last forever. The lucky guy who got the Net::IRC name will continue to have this name forever and ever, despite the fact that the module clearly states "DEAD SINCE 2004" on the title. (Edit: mst: and you wouldn't believe how much chasing around I did to get control of Net::IRC to add that message :) Most modules like that don't tell you that kind of warning, though. So, people who are used to the Net::* namespace will think "Hey, if Net::Telnet is the Telnet module and Net::DNS is the DNS module, then I should try Net::IRC for my IRC needs".
In the world of the capitalistic domain name system, it's still somewhat of a problem, but not nearly as bad. Google.com still takes you to Google, Perl.org still takes you to the Perl Foundation, and CPAN.org takes you to CPAN. Even if you can't quite find it from a straight domain entry, Google's search engine is powerful enough to find you exactly what you are looking for (almost) every time.
(And I won't even get into the modules called ::ButActuallyWorksThisTime...)
My Solution: Implement a deprecation process that would eventually remove the module from PAUSE. This could be tied into the "distro_type" variable above. Yes, this would be voluntary, but other processes (such as the orphanage) could enhance the automated processes.
Caveat: Modules and distros are not the same thing. Would this only apply to full distros, or is there a way to remove indexing from modules?