Perl 5 Porters Weekly: September 10-September 16, 2012
[cross posted from its original blog]
Welcome to Perl 5 Porters Weekly, a summary of the email traffic on the perl5-porters email list. Once again subroutine signatures dominated the list. I'll put all of the discussion about them at the end of this summary. I've added an "official" license to these summaries. It's Creative Commons BY-NC 3.0. You can find the summary of rights granted by that license by following that link. The full license text is available in the github repo. Obviously, the content of emails quoted here are owned by their respective authors.
This week's dusty thread is CALL FOR DOCS: how to dual life? from the week of August 6, 2012. Rik was looking for a volunteer to document the process of dual-lifing core code. He specifically called out Tie::Scalar and Time::local. Anyone have any insight into this? Contact Ricardo Signes.
This week's quote of the week is about an epic csh behavior relating to getcwd. As if you needed more reasons to avoid csh.
Csh does not rely on PWD. Instead, it does a getcwd all by itself.
It then prepends 'set cwd =' and evals the result. Thus interesting
things happen if the directory happens to have any csh metacharacters
in its name.
Topics this week:
- EXISTS and SCALAR return values are treated differently
- 5.14.3 approaches
- PATCH: pack()ing long words
- rxres_save etc (was Re: Fix for recursive substitution)
- given/when/~~ "final" thoughts (ha ha ha)
- Changing the Perl error message when a module is not found
- Named prototypes (again)
EXISTS and SCALAR return values are treated differently
I saved this thread because Yves Orton mentioned he had built some code to collect and show detailed hash utilizaton stats. The background for this thread had to do with the return values of EXISTS and SCALAR in tied hashes.
5.14.3 approaches
Dominic Hargreaves announced that he's doing some work to get a 5.14.3 maintenance release shipped, and so was looking for any overlooked patches which hadn't been part of the cherrymaint process. The first RC is due in September.
PATCH: pack()ing long words
Last month, I noted that David Cantrell was proposing a new pack syntax that would take arbitrary length words. This week he submitted a set of patches to reserve the proposed syntax in blead.
rxres_save etc (was Re: Fix for recursive substitution)
Nicholas Clark wrote an email saying that rxres_save() seems to do too much work in the perl core. The extra steps apparently date back to a 1997 patch intended for DBD::Oracle. Both Dave Mitchell and Yves Orton think a rethink and clean up is overdue for the way regex state is saved for potentially re-entrant/recursive calls.
given/when/~~ "final" thoughts (ha ha ha)
Ricardo posted a summary of his view of smartmatch:
$x ~~ undef
$x ~~ $overloaded_object
$x ~~ sub {}
$x ~~ regex
...or fail
...with when:
when ("foo") # str eq
when (12345) # num ==
when ($x) # ~~
when { ... } # block evaluates true
when breaks the enclosing topicalizer
And later, he wrote:
I feel like no matter what, given/when/~~ are really balancing on the
very very thin edge of "can be worth the complexity." I really do not
relish the idea of all the required work being done only to realize that
we've gone from unbearable to just barely bearable.
Which got a couple of +1s and surprisingly few replies given how popular this thread was two weeks ago.
Changing the Perl error message when a module is not found
Michael Stapelberg posted a suggestion to make the "module not found" error message friendlier to newbies. One proposed variation I liked was
Can't locate LWP/UserAgent.pm in @INC (@INC contains: /etc/perl
/usr/local/lib/perl/5.12.4 /usr/local/share/perl/5.12.4 /usr/lib/perl5
/usr/share/perl5 /usr/lib/perl/5.12 /usr/share/perl/5.12
/usr/local/lib/site_perl .) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
The above error is most likely caused by Perl not finding the module
LWP::UserAgent. Try installing LWP::UserAgent from your distribution
or via CPAN. Run 'perldoc perlmodinst' for more information.
The thread veered into a digression about having or adding "real" exception objects into the core (someday.)
Named prototypes (again)
Recall that last week, there was a big technical discussion about what the default behavior for a subroutine signature (aka named prototype) ought to be. Some argued for a read-only alias to @_, and some argued for a read-write copy following the principle of least surprise, and still others argued for read-write aliases which would eliminate the overhead of @_ setup.
Nicholas Clark answered:
Perl 6 specifies that the default is an alias (to avoid a copy, for
speed) and read-only (for the least surprise):
http://perlcabal.org/syn/S06.html#Parameters_and_arguments
In an ideal world these seem sane defaults, because they are made for
sane reasons. We're not *confident* that we are able to implement
read-only aliases (of read-write values). But
a) Chip had proposed a working patch
b) And asked for help measuring its impact
c) No-one helped.
So a good start here would be for SOMEONE TO VOLUNTEER to help get that
restarted. To at least get an answer to "is this approach going to
work?" (and possibly "does it look likely that no approach would work?")
But, Ricardo Signes wrote just a bit later that:
The reasonable default is copy, because decades of Perl programming have
told us that our arguments are copies. That is, as was said elsewhere
in this thread, we'd codifying existing patterns here.
Which seems to be the dispostive word on the matter. Chip Salzenberg added:
I agree here for four reasons:
1. Rik is right
2. Copies happen a lot; we'd be better off making copies faster (and
use less memory) than making them rarer or (as I once thought was a
good idea) introducing readonly aliases
3. I already have a patch half-done, the "minimal copy" patch, to make
copies smaller; I haven't gotten round tuition for them but I have
high expectation of having them ready for 5.18
4. It's possible to keep small strings in bodiless SVs, or making COW
good (if it isn't), which also makes copying cheaper
Later, Jesse Luehrs applied the patch in question (which creates read-only aliases to $_[0], etc) to measure its impact as Nicholas asked above and reported that:
[P]reliminary benchmarks don't look promising:
Current:
./perl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest
-makelist - 14.40s user 0.10s system 99% cpu 14.565 total
With SVt_BIND patch:
./perl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest
-makelist - 16.23s user 0.09s system 99% cpu 16.372 total
Copies might actually turn out to be faster than read-only aliases,
given the way the perl 5 interpreter is implemented.
There was also a serious digression that took on a life of its own, mostly revolving around how P5P interacts with feature suggestions, commentary on the same, and how the Perl community ought to spend more effort on performance enhancements than new shinies.
David Golden wrote:
[M]any people on p5p express dissatisfaction with the status quo
[ed. about performance], yet are unable or unwilling to:
(a) climb up the learning curve to contribute effectively
(b) take on less demanding tasks, freeing up the time of experts
Until more people do (a) or (b), the "hard" problems will continue to
suffer a lack of tuits.
Nicholas Clark added:
There are SIX HUNDRED subscribers to the list. About SIXTY are active in
e-mail discussions. Of the order of SIX people actually commit stuff.
And yet somehow the sixty (plus) seem to assume that if an idea is worth
talking about *to death* then somehow that is enough to steer the six
(busy) people into acting upon it.
If one looks at the archives from (say) 2003 compared with 2012
one notices two things
1: 2003 has many more medium sized threads
2012 has either massive threads, or messages with ZERO or one reply
2: 2003 has far more discussion of code
People cared just as much about perl then. But more people helped do
things.
I'd like us to get back to that. I don't know how.
Peter Rabbitson argued that p5p ought to more carefully weigh the impact of adding features to core since the Perl core has been famously reluctant to remove bits of the language.
[T]he level of scrutiny before inclusion does not seem to match the fact
that perl5 does not have a backtrack mechanism. I do not know what
would constitute "enough" scrutiny. I just know (based on past
achievements) that we do not scrutinize new syntax enough.
To support the claim that Perl's performance has deteriorated over time, Steffen Schwigon posted some summaries of his work comparing Perl version X vs. Perl version Y doing fibonacci sequence generation. He writes:
fib (plain subs) in Perl is ~2500x slower than symmetric
implementation with C
- C comparison not part of Perl::Formance, done separately
* fibOO (selfmade/Moose/Mouse) is ~1.5x slower than plain subs
* fibMXDeclare is about ~500x times slower than plain subs.
- yes, that's 120000 times slower than C, sorry.
- but probably due to some follow-up effects like huge mem triggering
swapping or similar
- anyway, the number itself is not an artifact or mistake, it's
stable over different Perl versions, it's a complete mess,
therefore I don't run it often
Trends, still fibonacci:
* Perl 5.10 to 5.14 improved a runtime of 90s down to 76s.
* Perl 5.16 deteriorated back to ~90s runtime.
* The overhead of "threaded" over "non-threaded"
... was in 5.10 ~5% more time
... got worse in 5.12 to ~15% more time
... improved in 5.14 to nearly zero
... and is now back worse in 5.16 with ~20% more time
* I don't have conclusive 5.17.x numbers due to current issues, but
first experiments seem to indicate a trend towards more slowness.
Dave Mitchell wrote (not specifically in reply to the above):
The idea that p5p doesn't care about, nor does anything related to
performance, is a bit of a myth. There have been lots of optimisations
added over the years. These tend to be low-key affairs: no major new
language syntax, nor porting to shiny new VMs etc (that's what the perl6
effort is for); but in many small, but accumulative ways we've been
fixing things. E.g. regex Tries; reorganising SV structures so they're
smaller and quicker.
I think the net affect of these efforts is that we've managed to
generally avoid perl5 slowing down any further over years as new
features and bug fixes have been added (which would normally have the
accumulative effect of gradually slowing things down).
Minor correction to the named-parameters thread: I don't think anybody really still considers rw aliasing a valid option. The remaining viable and significant optimization is to avoid setting up @_ in case of named parameters.