Categorizing CPAN modules

Warning: no polished content ahead (as if my writing is polished?) It's all brain dump.

CPAN has categories, but it has long been unmaintained and not very deep/specific: http://www.cpan.org/modules/by-category/

Should there be a new category/dmoz-like-directory project?

Creating tasks like Task::Topic::DataValidation or Task::{BeLike::SHARYANTO,}::Topic::{DataValidation,Logging,...}? Cute? Maintenance nightmare? Pointless? Probably all of them.

Should CPAN META contain tags, to let authors categorize themselves? Since nowadays the trend is using cute Ruby/Python/npm style for modules, and thus the module name themselves are not indicative of the modules' nature.

Should metacpan or other project let people crowdsource this? People can already comment/rate modules and star/favorite them. Adding tags is just one more "social stuff" to do.

14 Comments

Should there be a new category/dmoz-like-directory project?

You're proposing it, so you tell us.

What do you see this directory being? What would the benefits be? Who would use it? How would it help them? How would it help the community?

Module tagging is on the roadmap for MetaCPAN, but at this point it's still just a project in search of a volunteer.

I don't like the idea of Task::Topic::, which I'd see as yet another "abuse" of CPAN.

Previously I suggested that we should allow tagging of modules, for the same reasons you give. David Golden pointed out that there is already a keywords field in the metadata model which (a) isn't currently (widely) used, and (b) could be used for this.

Take-up would initially be quite slow, as people wouldn't start using it until they see value in it. For example, MetaCPAN could show keywords for a module, and you could click on a keyword to see all modules / dists tagged with the keyword. To bootstrap things, MetaCPAN could also let logged-in users tag modules with keywords: it would merge user-specified and author-specific keywords. That way a few people could tag a load of modules to demonstrate the utility.

I've talked to Olaf Alders (MetaCPAN) about this. He liked the idea, but pointed out there's quite a lot of work here.

I've often thought it would be nice if the existing documentation on the CPAN did most of this for us.

As an example, it would be great if the Moose documentation had more examples of using Moose in conjunction with related and complementary modules from the CPAN. Why, for example, isn't there an example of a small Web application written using Catalyst and Moose?

The Moose documentation links to MooseX::SemiAffordanceAccessor, but not to Moo, KiokuDB or XML::Rabbit.

I don't want to pick on Moose. Moose is actually better than most at this sort of thing - it has Moose::Manual::MooseX, but that document hasn't been meaningfully updated for over 3 years.

If the top 20 CPAN distributions had this kind of documentation, it would be much easier to find the module you need. You're using Moose, you need XML; you remember seeing an XML processing example in the Moose manual, so you look that up and see how they did it; oh they're using XML::Rabbit and it looks so easy...

Stop making the SEE ALSO section in documentation a list of links. Make it an essay. Compare your module with competitors. Where does it do a better job; and be honest, when might people be better off using another module? Give examples using your module in conjunction with other complementary modules. Give examples using your module in conjunction with other unrelated, but widely used modules. I don't always eat my own dog food, but I'll offer Scalar::Does as an example of a good SEE ALSO section.

I was hoping that something along these lines might emerge out of perldoc plus plus, but there didn't seem to be enough interest to get that ball rolling.

I think using Pinboard or similar service would be great.

I am sure we could populate the tags in very creative ways.

It is even uses Perl.

While that is a nice concept, it does put the burden of publicizing your module on other module authors. The concept of adding a tag/keyword to your own module and let the aggregator do the work of collecting them all makes the most sense in the "how can this be sustainably maintained" kind of way.

In my LPW talk last year I proposed that we add the ability to tag modules, and then that published reviews could be associated with a tag. It was in follow-up to this thought that DAGOLDEN pointed out meta keywords.

This way one review would be "shared" across all modules, without having to replicate potentially large amounts of text across them.

Plus, if you consider a module like Data::Random, it might have two keywords: password-generation and synthetic-data. They would link to two different module groups, with some overlap, and potentially different reviews.

We should rely on module authors providing this, because most won't. But if there's a way to provide the relevant information to MetaCPAN and co, then it can provide an interface that brings all these things together.

Should CPAN META contain tags, to let authors categorize themselves?

Yes! Yes! Yes! The old taxonomy of modules has been irrelevant for years. Most authors don't even bother to register and classify their work. We should absolutely be using the keywords provided by the META spec.

Should metacpan or other project let people crowdsource this?

That would be nice. But I think the first step is to just make the keywords visible on each distro page, and then spread the word that META supports keywords. In time, keyword extensions for Dist::Zilla, Module::Install, and all their friends will emerge (if they don't exist already). Crowd sourcing keywords and making them searchabie can come later.

For starters, those who use (or used) CPAN's categories.

I'm guessing that very few people have even looked at the categories in at least seven or eight years. Nobody knows they're there. I think the days of browsing lists of modules by category is long gone, and we're now into a world where everything is keyword searching and links of tags.

Who's the user that you're trying to serve? I think there are two key users to be served: The experienced user who wants to quickly find a module that matches a given search criteria, and the novice user who doesn't realize what's in the CPAN that might do him good. It's this latter one that I could potentially see being served by categorized lists of modules, if he knew what the category hierarchy was.

Why not skunk up a a couple of pages as a prototype? Just sample HTML that gives people something to look at that shows your ideas, and they'll have something to build on with their ideas as well.

You mention dmoz, and I went and looked at it for the first time in forever, and it just seems so horrible using that old Yahoo-like hierarchy for browsing. I think anything you're going to be doing is going to have to be based on tags. StackOverflow has done marvelously well with tags.

Adding more links to other modules in SEE ALSO sections is also a way to link modules.
This is currently about the only solution to discover other related modules while browsing on MetaCPAN.

If you see such links missing for a distribution it is always a good thing to report this as a bug at rt.cpan.org. All authors to which I've reported such bugs have responded favorably.

Maybe we need some author tools that would automatically extract links from SEE ALSO and add them to a new section in META.json to make this information easier to process...

I like the idea of adding tags, but would like the public to be able to tag modules, not just module authors.

For example, I have some friends on Flickr who take great photos, but don't tag them. I follow up and tag on their behalf.

Another "Discovery" feature I would like for metacpan.org is "what's new by keyboard". For example, I'd like to know:

"What new modules have been published in the MooseX namespace recently?"

and

"What new modules have been published that use Moose recently?"

Thanks for raising an issue, Steven! Your post made me unbox and polish a toy project of mine which envolves tapping into the collective wisdom of Perl community:


CPAN::U is a collaborative filter based recommendation system for Perl developers. It gathers data from "++" button counts across the MetaCPAN and builds a predictive model of modules often used together by CPAN authors.

Not quite ready yet, consider as a pre-release: http://sysd.org/cpan-u/

Leave a comment

About Steven Haryanto

user-pic A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.