Perl 5 Porters Weekly: September 3-September 9, 2012

[ Cross posted from its original blog ]

Welcome again to Perl 5 Porters Weekly, a summary of the email traffic on the perl5-porters email list. Are you tired of talking and thinking about smartmatch? P5P was dominated this week by talk of named prototypes (again.)

Since the named prototypes discussion had so many responses, they'll be put at the end of the summary. I also decided this week to start a "dusty" thread feature - some issue that's been raised on p5p but without any subsequent response on the public list traffic.

This week's dusty thread is proposed/drafted new perl docs which were part of the p5p summary in July. These docs cover metadoc, perlblurb, perladvantages, and perlresources. They're intended for newbies and language marketing purposes. You can find the docs in this git repo. If you're interested in working on them, contact Uri Guttman.

Topics this week include:

  • Swapping SV bodies between two SV heads
  • UTF-8 just turned 20 years old
  • optimising JRuby by avoiding hashes
  • :utf8 status
  • Lexical subs are ready
  • Named prototypes (again)

Swapping SV bodies between two SV heads

Yves Orton posted an interesting question where he needed to compile a regex into a specific empty SV but the API does not support supplying the SV the regex will be compiled into. He developed some code to swap the bodies of two SV heads and wanted a sanity check. Based on list feedback he modified the code to leave the refcounts attached to their original SV heads.

Read the thread

UTF-8 just turned 20 years old

Karl Williamson shared Rob Pike's remembrance of the origin of UTF-8 on the list. Quoting Rob Pike:

The diner was the Corner Café in New Providence, New Jersey. We 
just called it Mom's, to honor the previous proprietor. I don't 
know if it's still the same, but we went there for dinner often, 
it being the closest place to the Murray Hill offices. Being a proper 
diner, it had paper placemats, and it was on one of those placemats 
that Ken sketched out the bit-packing for UTF-8. It was so easy once 
we saw it that there was no reason to keep the placemat for notes, and 
we left it behind. Or maybe we did bring it back to the lab; I'm not 
sure. But it's gone now. 

I'll always regret that.

optimising JRuby by avoiding hashes

Nicholas Clark writes that an article on speeding up JRuby used the technique of avoiding hash lookups in hot code paths, such as for method dispatch. Nicholas says that much of the infrastructure for doing something like this in Perl already exists (method caching, ISA caching) and wondered if this was a task someone wanted to work on. He later wrote he had experimentally implemented a suggestion of Chip Salzenburg's along these lines but that (in tl;dr summary):

[I]t's interesting and frustrating that it seems that (at least) 
a simple implementation of Chip's suggestion turns out not to be 
any sort of win.

Read the thread

:utf8 status

Leon Timmermans updates progress on a new :utf8 PerlIO layer he is writing with Christian Hansen. The current layer is just a flag that Perl should assume the bytestream is valid utf8 but doesn't actually check if that is true, or enforce that restriction.

It seems like this is very close to being finished but blocked by two bugs. The first is that the the :bytes layer is broken too. But Leon has a patch for that. The second bug is that :stdio + any other layer hangs. Leon would like to drop :stdio but there is code which depends on it in buggy ways. A fix would involve refactoring Perl's readline support into PerlIO (which Leon points out would probably be a good thing anyway overall.)

He finishes his email:

As usual, this whole thing turns out to be much more complicated 
than it should have been :-/

Read the thread

Lexical subs are ready

Father Chrysostomos wrote that a branch with an implementation of lexical subs is now available for people to play around with. BTW, what is a lexical sub? It's a subroutine that has a scope defined by its current block.

{
    my sub bar { say "hoge" };

    bar();
}

# can't call bar() here; doesn't exist

FC says you can even redefine lexical subs at runtime using eval as in

my sub foo { ... }

eval 'my sub foo {' .$stuff. '}';

Seriously cool stuff that should be in blead soon and may make it into 5.18! The big debate at the moment is how to enable this forward compatibily while still keeping it an "experimental" feature that might be removed or modified in the future. It sounds like there is going to be some kind of pragma to explicitly activate it above and beyond the use 5.018 version pragma bundle.

Read the thread

Named prototypes (again)

At the end of June I noted in one of my earliest summaries that Peter Martini had volunteered to implement code that allowed named arguments to appear as part of a subroutine prototype, as in

sub foo($a, $b, $c) {
    ...;
}

This week he posted an update on his progress with this effort. And then all hell broke loose. (Smartmatch? What smartmatch?) Here is what Peter wrote:

What I'd proposed, and mostly implemented so far, is:

1. When parsing a prototype, if an alphanumeric is detected, restart 
   the parsing as a list of named parameters.

2. The named parameter list would be what the various modules seem 
   to have converged on:
     a comma separated list of [qualifier list] [white space] 
     [sigil] [name], eg 

     my $a, my $b, my $c

     And if no qualifier is specified, assume 'my'

3. The last parameter can be greedy, @ or %; otherwise everything 
   must start with $

   A % would die if an odd number of parameters are listed to 
   construct it

4. All parameters are optional, and will be declared but undefined 
   if not passed in.

5. (The syntax would means in the future we could allow 
     my $var = 5, 
   to set a default, but that's a can of worms I don't want to think 
   about now)

6. An additional sub attribute, proto, which can be used to specify 
   the traditional proto definition:

     sub something(my $a, my $b, my $c) : proto($$$) { }

7. @_ would not be modified in anyway, so $_[0] accesses or even 
   my ($a2, $b2, $c2) = @_ would still work, if someone wanted to.

Anyway, in the simplest (and I think default) case of 
sub ($a, $b, $c) { }, this would be exactly equivalent to the 
sub { my ($a, $b, $c) = @_; } case mentioned, independent of any 
additions to the optimizer or re-writing ops - everything would be 
handled by a flag on the CV and assignment in pp_entersub

Steffen Mueller really wanted to see something like this in Perl for a long time. He even proposed it a while ago and it got derailed. But he wrote he saw 3 possible ways to interpret sub f($a, $b, $c):

rw aliasing: Makes those named parameters just a different way of 
writing $_[$index]. This rw aliasing would mean that we entirely cut 
away the overhead of ($a, $b, $c) = @_ copying. Nice! But it would 
probably mean that people shoot themselves in the foot all the time, so 
I heavy-heartedly discount this optimization opportunity.

ro aliasing: Requires binds. Did Chip's bind patch ever make it in? In 
theory, ro aliases should be well optimizable. In practice, I'm not sure 
how I'd do that. Seems the safest option to me, since its also a very 
common way to use your function parameters. How does this combine with 
defaults, though?

copying: Least surprise. Potentially least optimizable down the road.

Later Reini Urban wrote (among other things in tone that aren't worth echoing here) that p5p should adopt Perl 6 syntax and style because they're "Larry decisions" which are much better than p5p decisions.

Ricardo Signes wrote back that:

Perl 6 is not Perl 5.  It's just another language that's out there that
happens to look a lot like Perl 5.  Like Awk.  It has some really cool
ideas that we should look at and figure out whether we can learn from
them and build our own features into Perl 5 that steal from them, but we
should actually design them to make sense in *our* language.  I mean,
Rob Pike is a really smart guy, but we don't want to blindly steal bits
of Go and drop them in Perl 5.

Working comprehensively through design ramification of language changes
can take time and be both frustrating and tiring, but *that's* what we
have to do if we want to avoid mistakes.  We *don't* need to (nor should
we) simply import designs from another language wholesale.

Nicholas Clark added:

We *can't* blindly pick and choose parts of the Perl 6 design and bring
them back verbatim. Because the Perl 6 deign is a whole, diverged from
Perl 5.  Not a menu of "Perl 5"++ features that can be adopted
independently.

Larry's overriding design decision for Perl 6 seems to be "I can't fix
this in Perl 5".

Back to the technical discussion, Vincent Pit wrote

To make this more clear : I see no value in

    sub foo ($x, $y) {
        ...
    }

being equivalent to

    sub foo {
        my ($x, $y) = @_;
        ...
    }

That adds literally zero to Perl : if I want these semantics, I can 
write the assignment myself. A couple of saved keystrokes won't be 
enough to make me add a hard dependency on a newer version of perl. But 
making it default to read-*write* aliases (there :) actually makes that 
syntax useful.

There were several responses to this, but they came after September 9. (If you like to read ahead, feel free to find out what they say!)

Read the thread

2 Comments

At this point in Perl5's history, is there really much use for prototypes? Everyone has more or less settled on using a hash(ref) for named parameters, and this proposal seems to only cover positional parameters. It's at least ten years too late.

Looking at some of the more popular CPAN modules (Test::More, Scalar::Util, List::Util, Try::Tiny, Carp, Encode, Digest, URI) there are plenty of subs out there that don't just take a single hash(ref).

Thanks to Moose and friends, the hash is becoming the defacto standard for object constructor methods, but apart from that one particular type of sub, there's still a lot of variance.

About Perl 5 Porters Summaries

user-pic Weekly p5p summaries