What should be in core?

How should we figure out what should be in core? Florian floated some ideas for removing some crufty modules from Perl 5.16. Some of these I don't care about, some of these I never knew about, and at least one I'd like to see in the standard distribution:

  • Text::Soundex
  • File::CheckTree
  • Text::Abbrev
  • Dumpvalue
  • autouse
  • NEXT

There are several camps in this little battle, and people tend to stay in their camp. I think all of them have at least threads of validity.

  • We're stuck with everything already in there because we don't know who depends on it
  • Make everything an additional installation, like Task::Kensho
  • Have the minimum you need to install modules from CPAN. (This is getting much leaner).
  • Include a superset of the Standard Library distribution, like Strawberry Perl or ActivePerl does.
  • pragmas (like autouse) are part of the language, different from a module (such as Text::Soundex).

When people say that we suck at marketing, this is what they should be talking about. Perl is not one thing to all people and there are interesting use cases for each of those camps. Instead of calling that a camp though, lets call it a market segment. But, we have no idea how important any of those market segments are. I'm not even going to guess. You shouldn't guess. We should identify them

Consider another question we probably can't answer, as a demonstration of our ignorance about the grand scheme of things. We know what's important to us, but we don't know what's important overall. Ask a techie "Which computer sells the most units?". Most can't actually give you an evidential answer. Some will confuse it with the operating system, and even then won't get the right answer. Some will limit themselves to personal computers, the sort that we expect to see with a separate display and a separate keyboard. What about smart (or dumb) phones? I'd be willing to bet that more people have cell phones than computers. In 2009, the Guardian reported that there are over 4 billion cell phone subscriptions. That was just active subscriptions.

Knowing that you don't know (the known unknowns), what do you do about it? Can you do anything about it? Let's say we knew the market sizes, 40% are sysadmins, 20% web programmers, and so on. Can we make any decision based on that? What if there was a 1% market segment whose disappearance or failure would be catastrophic?

What if we made Perl so bad that Booking.com or cPanel, both of whom just gave $10,000 to the Perl Core Maintenance Fund, decided to switch? Besides making it much easier for everyone else to hire a Perl programmer all of a sudden, what else might happen, or not happen? Or, what if the next perls break the sysadmin tools to the point where the Linux distros rewrite everything in Python and stop shipping perl? That doesn't matter to someone like me who compiles perls from source. Or does it? I make most of my money from people who don't compile perl from source, so maybe I'd have to become a cranky barista.

I don't know what should be in core. Putting philosophy and marketing aside, there are some things we can do (and you can help):

  • What do the various OS distributions actually need in core? What modules are they using for their hidden scripts and OS maintenance (and, which versions?) Who wants to find out? Along with that, how many are replacing core modules with local versions?
  • What extra modules do other distributions include?
  • What's the reason anything in core now is in the core? Some are there because they are useful and some because they are dependencies.


I think one of the problems that contributes to this debate is that if Perl did remove some modules from core and some script somewhere stops working they get a fairly cryptic error message:

Can't locate Something/Random.pm in @INC

For people who don't know Perl very well, they can't tell what that message means or why they go it. Sure google could help, but wouldn't it be nicer if the error message was something more helpful?

Couldn't load Something::Random. Is that a typo? Or maybe you need to install Something::Random from CPAN?

Or something like that.

I'm going to disobey orders and offer a tentative guess at the size of a "market segment": I think there's a large segment of perl programmers who effectively can't use a module unless it's included in core. To cater to these people, we should be looking for ways to expand what's in core. I think that the people on the inside of this debate don't seem to realize the kind of locked-down conditions a lot of programmers have to cope with.

Anyway, interesting post. It certainly would be good to have something like real "marketing research" on these points.

Why a module like, for example, URI is not already in the core in this day and age could be worrisome for some, but i think that managing the cruft worries p5p much more than naming one library or another for inclusion. Besides, the core already carries all this built-ins, like netdb (getpw*,gethost*), formats, most of the low level Unix interfaces and whatnot. IMHO these should be moved to modules (Socket::, Format::, to slot the examples above) and only loaded somehow lazily. Also, the "use feature" mechanism could be used for much more aggressive changes, if we are to imagine a possible "use feature qw(compat)" or better still "use feature qw(nocompatible)"... following, for example, the vi/vim dichotomy (set nocompatible, anyone?) :)

I am not dreaming of Moose in the core, btw. Since you are asking, I, for one, am in the camp with the smallest core possible (which also would imply - for me, at least - the "most performant/bugfree/etc" of these worlds) with the awesome, unmatched, CPAN one "call" away. I'd be more interested in the devel and deployment tools for custom-compiled bundles perl+libs. The vendor supplied perls (debian/redhat/apple/etc) aren't exactly stellar for big apps IMHO :) when and if they aren't already obsolete or ship with .0 (Debian Lenny shipped 10.0 for example, 10.1 already had incompatibilities but many fixes) or don't contain bugs already fixed or simply nonexistent upstream.

As a contractor I often find myself in a situation where I'm not allowed to install stuff from CPAN or I cannot guarantee that the system that my script will run on has had CPAN module X installed. A lot of shops do not bother building a "networked Perl" or having a common repository for modules. Often you can't get root or you have to deal with Windows vs. Linux, etc.

Having a module in core is ideal as you can use it without worry that it'll be missing or otherwise unavailable. Case in point: While Term::ReadLine is in core Term::ReadLine::Gnu isn't. Without Term::ReadLine::Gnu perl's debugger is much less useful. Another example is Term::ReadKey. So being able to quickly and reliably read a single keystroke y/n is not easily available. I'm sure there are many other issues.

Let me ask you, what's bad about putting things in core? Just a larger distribution?

I'm all for more useful error messages. That the failure to load message does not even include the exact name of the module that failed to load is quite a problem. Remembering back to my phone company days, Error messages should tell you three things: What went wrong. Why it went wrong. What you can do about it.

To my way of thinking this is the big problem with the darkpan. Not only do we not know what's in there but we're too scared to help those who have to live there.

Lets break things for the better. But at least lets let people know why things broke and how they can help fix them.

One approach might be to keep the smallest share of core modules that satisfy the highest share of dependencies for other commonly useful modules.

I am a new Perl programmer and Perl is my first language. Even though in some ways it was instructive to fail installation of module after module, it was also like super irritating and stressful. I started manually installing modules from the dependencies up so that if I failed early on I at least knew who the culprit was and didn't load up my system with a bunch of other useless junk that I was only going to need for that top module anyway.

This was probably the best idea I ever had. A side effect of this self-flagellation is that you tend toward solutions that rely less on vagrant, possibly unmaintained heaps of derelict code than you would had you just been gunning the dependencies down with CPAN.

I no longer keep this regimen of manually walking down the dependency tree and installing everything by hand but it left a lasting impression on me: solutions to new and unique problems that can be manufactured from good ol' core dankness are probably more robust and well-thought-out than those that rely on a cocktail of other oddities that no one's ever heard of.

There are no doubt numerous exceptions to this rule, probably more than my small mind can even begin to comprehend. And perhaps those are also very good candidates to be retained or added to the core distribution. For example, a module like Chart::Clicker. I've never been able to install all its weird dependencies (Cairo, etc.) and yet every few weeks I try again because those bar graphs look so damn sexy. I gotta have it! These, like the workhorse modules I'm talking about above, should also be included (if the decision-makers are feeling generous that day).

In their fight to reduce the size of the initial installation requirement Linux distributions have already split up core perl into several packages. At minimum, most distributions have a separate package including perldoc and the documentation of perl.

I can imagine there will be further pressure to reduce the size of their default perl installation to the bare minimum they need to run the system.

That means during a training class I need to teach student how to install those packages (or how to ask their sysadmin to do that.)

Then for the most basic things they want to do they already need to download and install modules. OOP is not even mentioned at that point.

In the recent years we saw quite a lot of improvement in how we can install modules and even brew our own perl. The technial part has improved a lot.

IMHO we should invest more energy in educating the perl developers and their managers (!) around the world how to make the best use of these improvements.

perl-ctypes should be there a year ago, though this project looks abandoned now

I took a while to respond because I wanted to see what others came up with.

I'll definitely have to side with Gabor.

From a Linux Distributions perspective, it doesn't matter so much what is and what isn't in the core, if its not "core" the Linux Distribution can ship it as a separate package anyway, and possibly can continue making things "Part of the standard installation" anyway.

Personally, I'd perfer the lightweight core approach, but there is in my mind no good reason to force Perl to decide whether people have a "light" core or a "full" core.

An approach we could take is to have various "sets" of things that are part of the standard perl installation.

  1. Have a "minimal" set which contains only things that are essential to have a working Perl installation
  2. Have a "lightweight" set which predominantly contains ( for now ) things that are not available/plausible to install via a CPAN client
  3. Have a "standard" set that contains what Perl already contains

And then only officially support one of those sets completely.

This makes it easier for users/distributions to decide how they'll package things.

People perl-brewing are going to be more likely to want the "standard" set, and Linux Distributions are going to be more likely to want to use the "Lightweight" set and provide the difference with the standard set via their package management.

This is somewhat borrowing ideas from the "extended standard library" somewhat, but eliminating some of the distribution/install/politics involved.

Thus we can say "A standard Perl5 installation" and mean "all things that are provided as part of the standard set", but it doesn't mean you have to have the same versions of those things, or the same way of getting them in your system.

Implementation details however here are the important parts.

  • There should be a "sets" folder of sorts in the perl tree which contains configuration/files for building a given set
  • It should be easy for a given set definition to toggle various packages being part of that "set" reasonably easy
  • ./configure should accept a 'set=$name' flag ( or similar ) to chose which set is built/installed

The hope being this makes it easier for Linux distributions to add their own custom derivations of the sets easily.

It allows users to more easily see what a set contains and possibly create their own custom preferred set.

More importantly, it allows for modular extension of the perl installer.

It would be nice to do something such as
wget perl-5.14.1.tar.bz2
tar -jxf perl-5.14.1.tar.bz2
cd perl/
cd sets/
mkdir kensho
cd kensho/
wget http://kens.ho/config-1.12.tar.bz2
tar -jxf config-1.12.tar.bz2
wget http://kens.ho/sources-1.12.tar.bz2
tar -jxf sources-1.12.tar.bz2
cd ../../../
./configure set=kensho 

and build and install a full Perl installation with all of the bits of Task::Kensho in it.

Downstream can then package that up nicely for people as "perl-5.14.1-kensho" or soforth in their package manager and just be happy about it.

Hope something in this proves to be useful.

A while ago on p5p I suggested what I would like to see eventually happen and found I wasn’t the only one thinking along those lines. Namely, I’d like for the structure of the perl source to get to the point where the dual-life modules exist in some directory in the tree simply as tarballs identical to those on CPAN. The build process would use the regular module infrastructure to install them inside the build tree.

The job for making other, batteries-included (or even -excluded) distributions would then be very simple: just dump more tarballs from CPAN into that directory, prior to building.

The dual-life model could then in fact migrate out of p5p itself and become an independent project, the community-supported Official perl. (That’s similar to the structure that Rakudo (and Perl 6 at large) has inched towards, now that I think about it.) That project would also be far more accessible for casual contributors than the perl core. (The latter is not that complicated either, but (aside from the way dual-life modules are managed in fact requiring some work) is certainly a lot more intimidating. (An ulterior motive here is that working on that standard library project may help acclimatise contributors to the core as well.))

Excuse the parentheticals.

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).