Perl 5 Porters Weekly: September 3-September 9, 2012
[ Cross posted from its original blog ]
Welcome again to Perl 5 Porters Weekly, a summary of the email traffic on the perl5-porters email list. Are you tired of talking and thinking about smartmatch? P5P was dominated this week by talk of named prototypes (again.)
Since the named prototypes discussion had so many responses, they'll be put at the end of the summary. I also decided this week to start a "dusty" thread feature - some issue that's been raised on p5p but without any subsequent response on the public list traffic.
This week's dusty thread is proposed/drafted new perl docs which were part of the p5p summary in July. These docs cover metadoc, perlblurb, perladvantages, and perlresources. They're intended for newbies and language marketing purposes. You can find the docs in this git repo. If you're interested in working on them, contact Uri Guttman.
Topics this week include:
- Swapping SV bodies between two SV heads
- UTF-8 just turned 20 years old
- optimising JRuby by avoiding hashes
- :utf8 status
- Lexical subs are ready
- Named prototypes (again)
Swapping SV bodies between two SV heads
Yves Orton posted an interesting question where he needed to compile a regex into a specific empty SV but the API does not support supplying the SV the regex will be compiled into. He developed some code to swap the bodies of two SV heads and wanted a sanity check. Based on list feedback he modified the code to leave the refcounts attached to their original SV heads.
UTF-8 just turned 20 years old
Karl Williamson shared Rob Pike's remembrance of the origin of UTF-8 on the list. Quoting Rob Pike:
The diner was the Corner Café in New Providence, New Jersey. We
just called it Mom's, to honor the previous proprietor. I don't
know if it's still the same, but we went there for dinner often,
it being the closest place to the Murray Hill offices. Being a proper
diner, it had paper placemats, and it was on one of those placemats
that Ken sketched out the bit-packing for UTF-8. It was so easy once
we saw it that there was no reason to keep the placemat for notes, and
we left it behind. Or maybe we did bring it back to the lab; I'm not
sure. But it's gone now.
I'll always regret that.
optimising JRuby by avoiding hashes
Nicholas Clark writes that an article on speeding up JRuby used the technique of avoiding hash lookups in hot code paths, such as for method dispatch. Nicholas says that much of the infrastructure for doing something like this in Perl already exists (method caching, ISA caching) and wondered if this was a task someone wanted to work on. He later wrote he had experimentally implemented a suggestion of Chip Salzenburg's along these lines but that (in tl;dr summary):
[I]t's interesting and frustrating that it seems that (at least)
a simple implementation of Chip's suggestion turns out not to be
any sort of win.
:utf8 status
Leon Timmermans updates progress on a new :utf8 PerlIO layer he is writing with Christian Hansen. The current layer is just a flag that Perl should assume the bytestream is valid utf8 but doesn't actually check if that is true, or enforce that restriction.
It seems like this is very close to being finished but blocked by two bugs. The first is that the the :bytes layer is broken too. But Leon has a patch for that. The second bug is that :stdio + any other layer hangs. Leon would like to drop :stdio but there is code which depends on it in buggy ways. A fix would involve refactoring Perl's readline support into PerlIO (which Leon points out would probably be a good thing anyway overall.)
He finishes his email:
As usual, this whole thing turns out to be much more complicated
than it should have been :-/
Lexical subs are ready
Father Chrysostomos wrote that a branch with an implementation of lexical subs is now available for people to play around with. BTW, what is a lexical sub? It's a subroutine that has a scope defined by its current block.
{
my sub bar { say "hoge" };
bar();
}
# can't call bar() here; doesn't exist
FC says you can even redefine lexical subs at runtime using eval
as in
my sub foo { ... }
eval 'my sub foo {' .$stuff. '}';
Seriously cool stuff that should be in blead soon and may make it into 5.18!
The big debate at the moment is how to enable this forward compatibily while
still keeping it an "experimental" feature that might be removed or modified
in the future. It sounds like there is going to be some kind of pragma
to explicitly activate it above and beyond the use 5.018
version pragma
bundle.
Named prototypes (again)
At the end of June I noted in one of my earliest summaries that Peter Martini had volunteered to implement code that allowed named arguments to appear as part of a subroutine prototype, as in
sub foo($a, $b, $c) {
...;
}
This week he posted an update on his progress with this effort. And then all hell broke loose. (Smartmatch? What smartmatch?) Here is what Peter wrote:
What I'd proposed, and mostly implemented so far, is:
1. When parsing a prototype, if an alphanumeric is detected, restart
the parsing as a list of named parameters.
2. The named parameter list would be what the various modules seem
to have converged on:
a comma separated list of [qualifier list] [white space]
[sigil] [name], eg
my $a, my $b, my $c
And if no qualifier is specified, assume 'my'
3. The last parameter can be greedy, @ or %; otherwise everything
must start with $
A % would die if an odd number of parameters are listed to
construct it
4. All parameters are optional, and will be declared but undefined
if not passed in.
5. (The syntax would means in the future we could allow
my $var = 5,
to set a default, but that's a can of worms I don't want to think
about now)
6. An additional sub attribute, proto, which can be used to specify
the traditional proto definition:
sub something(my $a, my $b, my $c) : proto($$$) { }
7. @_ would not be modified in anyway, so $_[0] accesses or even
my ($a2, $b2, $c2) = @_ would still work, if someone wanted to.
Anyway, in the simplest (and I think default) case of
sub ($a, $b, $c) { }, this would be exactly equivalent to the
sub { my ($a, $b, $c) = @_; } case mentioned, independent of any
additions to the optimizer or re-writing ops - everything would be
handled by a flag on the CV and assignment in pp_entersub
Steffen Mueller really wanted to see something like this in Perl for a
long time. He even proposed it a while ago and it got derailed. But
he wrote he saw 3 possible ways to interpret sub f($a, $b, $c)
:
rw aliasing: Makes those named parameters just a different way of
writing $_[$index]. This rw aliasing would mean that we entirely cut
away the overhead of ($a, $b, $c) = @_ copying. Nice! But it would
probably mean that people shoot themselves in the foot all the time, so
I heavy-heartedly discount this optimization opportunity.
ro aliasing: Requires binds. Did Chip's bind patch ever make it in? In
theory, ro aliases should be well optimizable. In practice, I'm not sure
how I'd do that. Seems the safest option to me, since its also a very
common way to use your function parameters. How does this combine with
defaults, though?
copying: Least surprise. Potentially least optimizable down the road.
Later Reini Urban wrote (among other things in tone that aren't worth echoing here) that p5p should adopt Perl 6 syntax and style because they're "Larry decisions" which are much better than p5p decisions.
Ricardo Signes wrote back that:
Perl 6 is not Perl 5. It's just another language that's out there that
happens to look a lot like Perl 5. Like Awk. It has some really cool
ideas that we should look at and figure out whether we can learn from
them and build our own features into Perl 5 that steal from them, but we
should actually design them to make sense in *our* language. I mean,
Rob Pike is a really smart guy, but we don't want to blindly steal bits
of Go and drop them in Perl 5.
Working comprehensively through design ramification of language changes
can take time and be both frustrating and tiring, but *that's* what we
have to do if we want to avoid mistakes. We *don't* need to (nor should
we) simply import designs from another language wholesale.
Nicholas Clark added:
We *can't* blindly pick and choose parts of the Perl 6 design and bring
them back verbatim. Because the Perl 6 deign is a whole, diverged from
Perl 5. Not a menu of "Perl 5"++ features that can be adopted
independently.
Larry's overriding design decision for Perl 6 seems to be "I can't fix
this in Perl 5".
Back to the technical discussion, Vincent Pit wrote
To make this more clear : I see no value in
sub foo ($x, $y) {
...
}
being equivalent to
sub foo {
my ($x, $y) = @_;
...
}
That adds literally zero to Perl : if I want these semantics, I can
write the assignment myself. A couple of saved keystrokes won't be
enough to make me add a hard dependency on a newer version of perl. But
making it default to read-*write* aliases (there :) actually makes that
syntax useful.
There were several responses to this, but they came after September 9. (If you like to read ahead, feel free to find out what they say!)
At this point in Perl5's history, is there really much use for prototypes? Everyone has more or less settled on using a hash(ref) for named parameters, and this proposal seems to only cover positional parameters. It's at least ten years too late.
Looking at some of the more popular CPAN modules (Test::More, Scalar::Util, List::Util, Try::Tiny, Carp, Encode, Digest, URI) there are plenty of subs out there that don't just take a single hash(ref).
Thanks to Moose and friends, the hash is becoming the defacto standard for object constructor methods, but apart from that one particular type of sub, there's still a lot of variance.