Perl 5 Porters Weekly: October 29-November 4, 2012
Welcome to Perl 5 Porters Weekly, a summary of the email traffic of the perl5-porters email list.
This week's topics are:
- perl-5.16.2 is now available
- perl-5.12.5 RC1 is now available
- What happened to the whole "small core" idea?
- Eliminating the "rehash" mechanism for 5.18
- benchmarking Murmurhash3 vs One-At-A-Time as perl's hash function
perl-5.16.2 is now available
Ricardo Signes announced that perl 5.16.2 is now available on a CPAN mirror near you.
perl-5.12.5 RC1 is now available
Dominic Hargreaves announced that perl 5.12.5-RC1 is now available on a CPAN mirror near you.
What happened to the whole "small core" idea
Last time, Peter Rabbitson asked what had happened to the idea of making Perl's core smaller. Ricardo Signes, the project's arbiter of taste, finally replied this week. I encourage you to "read the whole thing" as all the cool kids say these days, but here are some good excerpts:
Later on in this thread, you say something like, "people are talking about getting [subroutine signatures] into 5.18." I never thought this was seriously considered. That is, I thought maybe some were hoping to see it become an experimental feature, but not anything we'd be stuck with once we proved it stank. ...but really, I didn't think it was even going to get that far. But did it have to become a discussion about that? I didn't think so. There were two important things to figure out: (a) how can we get perl equipped with hooks for signature systems that share a common underpining so we can figure out what a good "core" one might be and (b) what might that core one be? (a) would be nice to see, at least experimentally in 5.18 (b) seems like it will be an ongoing discussion; why not have some of it on p5p? and if there's an implementation that is built in a branch, without the benefit of the hooks suggested by 'a' that lets us play with the things as discussed... awesome! [...] As you noted, the bikeshedding is largely about design decisions that have *not* been proved on CPAN or become a standard. This is a warning sign that it's maybe not baked enough. I don't think that means that we should stop talking about it, or stop building on the code that implements it. That currently-in-a-branch-of-perl code seems pretty good, and may very well be the code that does evolve into something proven enough to become the beloved reference implementation of signatures. After all, it's from that code that the signature-adding APIs will probably be born, right?
Eliminating the "rehash" mechanism for 5.18
Yves Orton proposed eliminating perl's defensive mechanism against pathological hash insertion attacks and replacing it with a per-run random hash seed. He explains:
Choose a random hash seed at process start up as it effectively prevents the attacker from constructing their colliding keys remotely. However it also means that the order in which a set of keys are stored in a hash becomes random. [...] Besides the advantages of avoiding the costs of the rehash mechanism this change will smoke out any hash order dependency bugs in the modules on CPAN, and in our own code. (I found at least two unrelated bugs in code in core due to hash randomization.) It will also prepare the ground for people to choose alternative hash algorithms for Perl to use, or to make it possible to use specialized hash algorithms in certain contexts such as 64 bit builds. The downside however will be that it almost certainly will make lots of lazily written code to fail. But IMO the cost is worth it.
Most people thought this was a good idea, but some thought there were additional security precautions which should be considered.
benchmarking Murmurhash3 vs One-At-A-Time as perl's hash function
Yves Orton had another interesting hash related email this week. He'd written earlier in the summer that he was looking at replacing perl's hash algorithm with one that could offer some signficant performance benefits especially with longer keys. This week he reported the results:
Conclusion: on my laptop Murmurhash3 is more than 3 times faster at hashing longer strings than one-at-a-time, and roughly the same for shorter strings.