Essay Archives

Perl 5’s list-flattening and reference-taking design choices


Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays – @foo is an array variable, distinct from the single scalar variable $foo – it’s actually impossible to nest arrays.

my @foo = (1, 2, 3, 4);
my @bar = (@foo, @foo);
# @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4

The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values.

This is a common thing for people to find weird about Perl. Really though it’s just a different default.

Perl’s reference-taking operator is simply dual with the splat operator in Ruby and recent Javascript.

The Coro situation

Since my recent participation at the QA Hackathon I have become aware that rather more people than I expected do not know the specifics of this situation. Fewer than I expected have heard of it at all, even, although there appears to be some general awareness at the “something happened with that” level at least.

However, the situation is being used to characterise Marc Lehmann whenever his name comes up (and it comes up rather more often than I would expect or consider necessary).

To give a clear picture of the facts and to avoid repeating that exercise every time I have a related conversation, here is an outline of where we are and how we got here.

(Thanks to Andreas König, Graham Knop, and Peter Rabbitson for proofreading drafts of this article and verifying the stated facts.)

Serializers for Perl: when to use what

This is a moderately edited (primarily rearranged) version of a comment on the Perl 5 issue tracker written by Yves Orton (demerphq). I thought it would be useful to a wider audience so I am reposting here with permission.

Note: this was written off the cuff and is not comprehensive. In correspondence, Yves noted that such an article could/should cover more formats, e.g. MsgPack.

I [feel strongly] that Data::Dumper is generally unsuitable as a serialization tool. The reasons are as follows:

The perversity of traditional Perl 5 dereferencing syntax

I wrote this article almost a year ago as part of an omnibus reply to a bunch of different posts from a perl5-porters thread. I never finished all parts of the reply and thus never sent this part either, but in contrast to the other parts of this stillborn mail, I think this one is worth reading. So asked Johan Vromans:

It still escapes me why @* was chosen instead of the much more logical []:


The reason is that there are a number of problems to solve with any new deref syntax:

Backcompat is holding us back!

“Let’s free ourselves from the shackles and do something bold!”

I always cringe when I hear this battle cry. Isn’t that sentiment exactly what set the trajectory for the Perl 6 effort? Maybe it’s just been so long that people have forgotten.

But that is precisely how Perl 6 became such an amazingly long trek: once you remove the constraint of staying compatible, everything is suddenly, potentially, up for reconsideration. Then when you start changing things, you discover that changes in one part of the language also affect several other, remote parts of the language. So it starts with the simple desire to fix a handful of obvious problems in obvious ways… and spirals out as you make changes, and further still as you make changes in response to your changes, ever further and further.

At that point, it is exceedingly likely that the project will fizzle out before it ever comes to any fruition. But even if you have the perseverance, you face an uphill battle: unless your project has the community’s implicit blessing as the successor (as Perl 6 does, due to Larry’s presence), it is likely to simply slip into oblivion… the way Kurila did.

So yes: backcompat is holding us back… the same way that gravity is. It keeps us from floating away untethered.

Note that I’m not saying it doesn’t really hold us back. I’d love to travel to space easily, too! I still await Perl 6, as well.

But what I think, every time someone proposes to throw off the shackles of backcompat and go for it, is that we already have one Perl 6 – we don’t need another.

A decade in CPAN toolchain

Dave Cross:

I’m not going to object to Module::Build leaving the core. I’m sure there are good reasons, I just wish I knew what they are. I am, however, slightly disappointed to find that Schwern was wrong ten years ago and that ExtUtils::MakeMaker wasn’t doomed.

Schwern wasn’t wrong and MakeMaker remains doomed all these years later. It’s still around only because there hasn’t been anything to take its place. Module::Build looked like it was going to be that usurper – but didn’t work out.

Note that the reason that, between EUMM and M::B, M::B is the one leaving the core, is that EUMM is necessary to build the core and M::B is not. The reason for that is that no one bothered to port the existing MakeMaker-dependent infrastructure to Module::Build. And that never happened because M::B never gained the necessary features (XS support, mainly) fast enough for anyone to want to – because it wasn’t sufficiently much better than EUMM for anyone to want it enough to add the features.

However, EUMM is about as marginally maintained nowadays as M::B. Both are doomed, though their type of doomedness is one that’s accompanied by remarkable staying power. (Break-the-CPAN status tends to have that effect.) RJBS is on record that, should EUMM ever become unnecessary to building the core, it will make its exit stage left much the same as M::B is making now.

So… what happened?

The UTF8 flag and you

If you are looking at the UTF8 flag in application code then your code is broken. Period.

The flag does NOT mean that the string contains character data. It ONLY means that the string contains a character > 255 either now or could have, at some point in the past.

For strings that contain characters > 127 < 256, you have no idea whatsoever about whether the string contains characters or bytes regardless of what the UTF-8 flag says. (For strings with only characters < 128 there is no difference whether they are ASCII or binary, …

Stop using Locale::Maketext

The following is a translation of a talk given by Steffen Winkler at German Perl Workshop 10, Internationalisierungs-Framework auswählen, in 2010. You can find my translation of the POD on Github. (In fact you are better off reading it there, because the CSS formatting on sucks, at least for now. If you’re seeing this in a feed reader, feel free to stay.)

Aside from Steffen’s talk, there is also Nikolai Prokoschenko’s rant On the state of i18n in Perl, which mentions to some extent the fact that gettext has established workflows as well as supporting software (such as graphical tools) known in many software communities and even outside the open source world, whereas Maketext has… none of that, and is known only to Perl folk.

Be smart. Don’t use Maketext.

About Aristotle

user-pic Waxing philosophical