What should be in core?

By brian d foy on July 20, 2011 9:38 PM

How should we figure out what should be in core? Florian floated some ideas for removing some crufty modules from Perl 5.16. Some of these I don't care about, some of these I never knew about, and at least one I'd like to see in the standard distribution:

Text::Soundex
File::CheckTree
Text::Abbrev
Dumpvalue
autouse
NEXT

There are several camps in this little battle, and people tend to stay in their camp. I think all of them have at least threads of validity.

We're stuck with everything already in there because we don't know who depends on it
Make everything an additional installation, like Task::Kensho
Have the minimum you need to install modules from CPAN. (This is getting much leaner).
Include a superset of the Standard Library distribution, like Strawberry Perl or ActivePerl does.
pragmas (like autouse) are part of the language, different from a module (such as Text::Soundex).

When people say that we suck at marketing, this is what they should be talking about. Perl is not one thing to all people and there are interesting use cases for each of those camps. Instead of calling that a camp though, lets call it a market segment. But, we have no idea how important any of those market segments are. I'm not even going to guess. You shouldn't guess. We should identify them

Consider another question we probably can't answer, as a demonstration of our ignorance about the grand scheme of things. We know what's important to us, but we don't know what's important overall. Ask a techie "Which computer sells the most units?". Most can't actually give you an evidential answer. Some will confuse it with the operating system, and even then won't get the right answer. Some will limit themselves to personal computers, the sort that we expect to see with a separate display and a separate keyboard. What about smart (or dumb) phones? I'd be willing to bet that more people have cell phones than computers. In 2009, the Guardian reported that there are over 4 billion cell phone subscriptions. That was just active subscriptions.

Knowing that you don't know (the known unknowns), what do you do about it? Can you do anything about it? Let's say we knew the market sizes, 40% are sysadmins, 20% web programmers, and so on. Can we make any decision based on that? What if there was a 1% market segment whose disappearance or failure would be catastrophic?

What if we made Perl so bad that Booking.com or cPanel, both of whom just gave $10,000 to the Perl Core Maintenance Fund, decided to switch? Besides making it much easier for everyone else to hire a Perl programmer all of a sudden, what else might happen, or not happen? Or, what if the next perls break the sysadmin tools to the point where the Linux distros rewrite everything in Python and stop shipping perl? That doesn't matter to someone like me who compiles perls from source. Or does it? I make most of my money from people who don't compile perl from source, so maybe I'd have to become a cranky barista.

I don't know what should be in core. Putting philosophy and marketing aside, there are some things we can do (and you can help):

What do the various OS distributions actually need in core? What modules are they using for their hidden scripts and OS maintenance (and, which versions?) Who wants to find out? Along with that, how many are replacing core modules with local versions?
What extra modules do other distributions include?
What's the reason anything in core now is in the core? Some are there because they are useful and some because they are dependencies.

12 comments

12 Comments

mpeters.myopenid.com | July 21, 2011 3:30 PM | Reply

I think one of the problems that contributes to this debate is that if Perl did remove some modules from core and some script somewhere stops working they get a fairly cryptic error message:

Can't locate Something/Random.pm in @INC

For people who don't know Perl very well, they can't tell what that message means or why they go it. Sure google could help, but wouldn't it be nicer if the error message was something more helpful?

Couldn't load Something::Random. Is that a typo? Or maybe you need to install Something::Random from CPAN?

Or something like that.

doomvox.myopenid.com | July 21, 2011 7:05 PM | Reply

I'm going to disobey orders and offer a tentative guess at the size of a "market segment": I think there's a large segment of perl programmers who effectively can't use a module unless it's included in core. To cater to these people, we should be looking for ways to expand what's in core. I think that the people on the inside of this debate don't seem to realize the kind of locked-down conditions a lot of programmers have to cope with.

Anyway, interesting post. It certainly would be good to have something like real "marketing research" on these points.

zgrim | July 21, 2011 11:35 PM | Reply

Why a module like, for example, URI is not already in the core in this day and age could be worrisome for some, but i think that managing the cruft worries p5p much more than naming one library or another for inclusion. Besides, the core already carries all this built-ins, like netdb (getpw*,gethost*), formats, most of the low level Unix interfaces and whatnot. IMHO these should be moved to modules (Socket::, Format::, to slot the examples above) and only loaded somehow lazily. Also, the "use feature" mechanism could be used for much more aggressive changes, if we are to imagine a possible "use feature qw(compat)" or better still "use feature qw(nocompatible)"... following, for example, the vi/vim dichotomy (set nocompatible, anyone?) :)

I am not dreaming of Moose in the core, btw. Since you are asking, I, for one, am in the camp with the smallest core possible (which also would imply - for me, at least - the "most performant/bugfree/etc" of these worlds) with the awesome, unmatched, CPAN one "call" away. I'd be more interested in the devel and deployment tools for custom-compiled bundles perl+libs. The vendor supplied perls (debian/redhat/apple/etc) aren't exactly stellar for big apps IMHO :) when and if they aren't already obsolete or ship with .0 (Debian Lenny shipped 10.0 for example, 10.1 already had incompatibilities but many fixes) or don't contain bugs already fixed or simply nonexistent upstream.

Andrew DeFaria | July 22, 2011 2:38 AM | Reply

As a contractor I often find myself in a situation where I'm not allowed to install stuff from CPAN or I cannot guarantee that the system that my script will run on has had CPAN module X installed. A lot of shops do not bother building a "networked Perl" or having a common repository for modules. Often you can't get root or you have to deal with Windows vs. Linux, etc.

Having a module in core is ideal as you can use it without worry that it'll be missing or otherwise unavailable. Case in point: While Term::ReadLine is in core Term::ReadLine::Gnu isn't. Without Term::ReadLine::Gnu perl's debugger is much less useful. Another example is Term::ReadKey. So being able to quickly and reliably read a single keystroke y/n is not easily available. I'm sure there are many other issues.

Let me ask you, what's bad about putting things in core? Just a larger distribution?

chris fedde replied to comment from mpeters.myopenid.com | July 22, 2011 4:19 AM | Reply

I'm all for more useful error messages. That the failure to load message does not even include the exact name of the module that failed to load is quite a problem. Remembering back to my phone company days, Error messages should tell you three things: What went wrong. Why it went wrong. What you can do about it.

To my way of thinking this is the big problem with the darkpan. Not only do we not know what's in there but we're too scared to help those who have to live there.

Lets break things for the better. But at least lets let people know why things broke and how they can help fix them.

brian d foy replied to comment from chris fedde | July 22, 2011 4:49 AM | Reply

I don't think we're too scared to help those who live there, but I'm certainly not going to go to every individual Perl user and explain to them what went wrong.

One of the problems with the "Can't locate" message is that it comes a few steps below the user level, so the what you can do parts are trickier. Is it because you mispelled the module name, you didn't install it, you installed it incorrectly, you didn't update PERL5LIB, or something else? There's a pretty big chunk of the world to explain there. I know that's at least two chapters of Intermediate Perl.

That's not to say that it can't be better, but it's also not hard to figure out that error message given Google, Perl beginners mailing lists, Stackoverflow, and countless other very helpful resources out there. People are always standing by waiting to help, but I've found that this is not a problem people have with Perl: it's a general problem they have with analyzing any problem given to them, computers or not. There's not much we can do to fix that.

And, if people learned Perl from my book, they'd know about diagnostics already, although the message there isn't that much better. :)

At some point, people need to learn something about how the world works. A good error message, in your phone company days, isn't going to help someone with no experience in the domain.

brian d foy replied to comment from Andrew DeFaria | July 22, 2011 4:51 AM | Reply

One of the arguments for removing stuff from core is the maintenance burden. There's a limited number of man hours to make a perl release, and if a the vanishingly small group of people to handle the work are distracted by updating core modules, other things don't get done.

A lot of people would like to have more things in core, but they are expecting other people to do all the work for them. I don't see more than a couple people stepping up to handle the upkeep of stuff already in core.

A secondary argument is that the OS distributions don't like the big distributions. They want to ship minimal systems with small footprints and the least hassle for them to package and verify.

jimmy | July 22, 2011 8:13 AM | Reply

One approach might be to keep the smallest share of core modules that satisfy the highest share of dependencies for other commonly useful modules.

I am a new Perl programmer and Perl is my first language. Even though in some ways it was instructive to fail installation of module after module, it was also like super irritating and stressful. I started manually installing modules from the dependencies up so that if I failed early on I at least knew who the culprit was and didn't load up my system with a bunch of other useless junk that I was only going to need for that top module anyway.

This was probably the best idea I ever had. A side effect of this self-flagellation is that you tend toward solutions that rely less on vagrant, possibly unmaintained heaps of derelict code than you would had you just been gunning the dependencies down with CPAN.

I no longer keep this regimen of manually walking down the dependency tree and installing everything by hand but it left a lasting impression on me: solutions to new and unique problems that can be manufactured from good ol' core dankness are probably more robust and well-thought-out than those that rely on a cocktail of other oddities that no one's ever heard of.

There are no doubt numerous exceptions to this rule, probably more than my small mind can even begin to comprehend. And perhaps those are also very good candidates to be retained or added to the core distribution. For example, a module like Chart::Clicker. I've never been able to install all its weird dependencies (Cairo, etc.) and yet every few weeks I try again because those bar graphs look so damn sexy. I gotta have it! These, like the workhorse modules I'm talking about above, should also be included (if the decision-makers are feeling generous that day).

Gábor Szabó - גאבור סבו | July 22, 2011 12:57 PM | Reply

In their fight to reduce the size of the initial installation requirement Linux distributions have already split up core perl into several packages. At minimum, most distributions have a separate package including perldoc and the documentation of perl.

I can imagine there will be further pressure to reduce the size of their default perl installation to the bare minimum they need to run the system.

That means during a training class I need to teach student how to install those packages (or how to ask their sysadmin to do that.)

Then for the most basic things they want to do they already need to download and install modules. OOP is not even mentioned at that point.

In the recent years we saw quite a lot of improvement in how we can install modules and even brew our own perl. The technial part has improved a lot.

IMHO we should invest more energy in educating the perl developers and their managers (!) around the world how to make the best use of these improvements.

? bugmenot | July 22, 2011 3:04 PM | Reply

perl-ctypes should be there a year ago, though this project looks abandoned now

http://oid.fox.geek.nz/kent.fredric | July 24, 2011 10:35 AM | Reply

I took a while to respond because I wanted to see what others came up with.

I'll definitely have to side with Gabor.

From a Linux Distributions perspective, it doesn't matter so much what is and what isn't in the core, if its not "core" the Linux Distribution can ship it as a separate package anyway, and possibly can continue making things "Part of the standard installation" anyway.

Personally, I'd perfer the lightweight core approach, but there is in my mind no good reason to force Perl to decide whether people have a "light" core or a "full" core.

An approach we could take is to have various "sets" of things that are part of the standard perl installation.

Have a "minimal" set which contains only things that are essential to have a working Perl installation
Have a "lightweight" set which predominantly contains ( for now ) things that are not available/plausible to install via a CPAN client
Have a "standard" set that contains what Perl already contains

And then only officially support one of those sets completely.

This makes it easier for users/distributions to decide how they'll package things.

People perl-brewing are going to be more likely to want the "standard" set, and Linux Distributions are going to be more likely to want to use the "Lightweight" set and provide the difference with the standard set via their package management.

This is somewhat borrowing ideas from the "extended standard library" somewhat, but eliminating some of the distribution/install/politics involved.

Thus we can say "A standard Perl5 installation" and mean "all things that are provided as part of the standard set", but it doesn't mean you have to have the same versions of those things, or the same way of getting them in your system.

Implementation details however here are the important parts.

There should be a "sets" folder of sorts in the perl tree which contains configuration/files for building a given set
It should be easy for a given set definition to toggle various packages being part of that "set" reasonably easy
./configure should accept a 'set=$name' flag ( or similar ) to chose which set is built/installed

The hope being this makes it easier for Linux distributions to add their own custom derivations of the sets easily.

It allows users to more easily see what a set contains and possibly create their own custom preferred set.

More importantly, it allows for modular extension of the perl installer.

It would be nice to do something such as



wget perl-5.14.1.tar.bz2

tar -jxf perl-5.14.1.tar.bz2

cd perl/

cd sets/

mkdir kensho

cd kensho/

wget http://kens.ho/config-1.12.tar.bz2

tar -jxf config-1.12.tar.bz2

wget http://kens.ho/sources-1.12.tar.bz2

tar -jxf sources-1.12.tar.bz2

cd ../../../

./configure set=kensho

and build and install a full Perl installation with all of the bits of Task::Kensho in it.

Downstream can then package that up nicely for people as "perl-5.14.1-kensho" or soforth in their package manager and just be happy about it.

Hope something in this proves to be useful.

Aristotle | July 29, 2011 7:34 AM | Reply

A while ago on p5p I suggested what I would like to see eventually happen and found I wasn’t the only one thinking along those lines. Namely, I’d like for the structure of the perl source to get to the point where the dual-life modules exist in some directory in the tree simply as tarballs identical to those on CPAN. The build process would use the regular module infrastructure to install them inside the build tree.

The job for making other, batteries-included (or even -excluded) distributions would then be very simple: just dump more tarballs from CPAN into that directory, prior to building.

The dual-life model could then in fact migrate out of p5p itself and become an independent project, the community-supported Official perl. (That’s similar to the structure that Rakudo (and Perl 6 at large) has inched towards, now that I think about it.) That project would also be far more accessible for casual contributors than the perl core. (The latter is not that complicated either, but (aside from the way dual-life modules are managed in fact requiring some work) is certainly a lot more intimidating. (An ulterior motive here is that working on that standard library project may help acclimatise contributors to the core as well.))

Excuse the parentheticals.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About brian d foy

I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).

More info »

brian d foy