Perl 5 Porters Weekly: October 29-November 4, 2012
Welcome to Perl 5 Porters Weekly, a summary of the email traffic of the perl5-porters email list.
This week's topics are:
- perl-5.16.2 is now available
- perl-5.12.5 RC1 is now available
- What happened to the whole "small core" idea?
- Eliminating the "rehash" mechanism for 5.18
- benchmarking Murmurhash3 vs One-At-A-Time as perl's hash function
perl-5.16.2 is now available
Ricardo Signes announced that perl 5.16.2 is now available on a CPAN mirror near you.
perl-5.12.5 RC1 is now available
Dominic Hargreaves announced that perl 5.12.5-RC1 is now available on a CPAN mirror near you.
What happened to the whole "small core" idea
Last time, Peter Rabbitson asked what had happened to the idea of making Perl's core smaller. Ricardo Signes, the project's arbiter of taste, finally replied this week. I encourage you to "read the whole thing" as all the cool kids say these days, but here are some good excerpts:
Later on in this thread, you say something like, "people are talking about
getting [subroutine signatures] into 5.18." I never thought this was
seriously considered. That is, I thought maybe some were hoping to see
it become an experimental feature, but not anything we'd be stuck with
once we proved it stank.
...but really, I didn't think it was even going to get that far. But did
it have to become a discussion about that? I didn't think so. There were two
important things to figure out: (a) how can we get perl equipped with hooks
for signature systems that share a common underpining so we can figure out what
a good "core" one might be and (b) what might that core one be?
(a) would be nice to see, at least experimentally in 5.18
(b) seems like it will be an ongoing discussion; why not have some of it on
p5p? and if there's an implementation that is built in a branch, without the
benefit of the hooks suggested by 'a' that lets us play with the things as
discussed... awesome!
[...]
As you noted, the bikeshedding is largely about design decisions that have
*not* been proved on CPAN or become a standard. This is a warning sign that
it's maybe not baked enough. I don't think that means that we should stop
talking about it, or stop building on the code that implements it. That
currently-in-a-branch-of-perl code seems pretty good, and may very well be the
code that does evolve into something proven enough to become the beloved
reference implementation of signatures. After all, it's from that code that
the signature-adding APIs will probably be born, right?
Eliminating the "rehash" mechanism for 5.18
Yves Orton proposed eliminating perl's defensive mechanism against pathological hash insertion attacks and replacing it with a per-run random hash seed. He explains:
Choose a random hash seed at process start up as it effectively prevents
the attacker from constructing their colliding keys remotely. However
it also means that the order in which a set of keys are stored in a hash
becomes random.
[...]
Besides the advantages of avoiding the costs of the rehash mechanism
this change will smoke out any hash order dependency bugs in the
modules on CPAN, and in our own code. (I found at least two unrelated
bugs in code in core due to hash randomization.) It will also prepare
the ground for people to choose alternative hash algorithms for Perl
to use, or to make it possible to use specialized hash algorithms in
certain contexts such as 64 bit builds.
The downside however will be that it almost certainly will make lots
of lazily written code to fail. But IMO the cost is worth it.
Most people thought this was a good idea, but some thought there were additional security precautions which should be considered.
benchmarking Murmurhash3 vs One-At-A-Time as perl's hash function
Yves Orton had another interesting hash related email this week. He'd written earlier in the summer that he was looking at replacing perl's hash algorithm with one that could offer some signficant performance benefits especially with longer keys. This week he reported the results:
Conclusion: on my laptop Murmurhash3 is more than 3 times faster at
hashing longer strings than one-at-a-time, and roughly the same for
shorter strings.
xxHash is even faster than Murmurhash3: https://code.google.com/p/xxhash/
holy crap is xxHash fast! I made a quick XS module to try it out and it completely blows Digest::CRC::crc32() out of the water.
This test was on 5 paragraphs of Lorem Ipsum text.
I haven't gotten a good test because the disparity is so large that for any number of iterations that might adequately test xxHash takes FOREVER for crc32.
It only does 32 bit integer hashing, but its so fast. Is it worth putting it on CPAN?
Hi
If you do put it on CPAN, I'd like to see the docs include a discussion of when to use it 'v' when to use a cryptographic hash.
Thats my problem. I'm not really the right person to release this, other than that I can do it. I cannot think of a use case. I'm sure there is one. Hashing at the core level seem one, but that doesn't need a CPAN module, but the work that the p5p is already discussing. My only reason to do it is that it has impressive speed, but I'm not sure thats a good enough reason.
xxxHash is even faster!
It's not so much as xxHash is fast as Digest::CRC is (curiously) slow as molasses. Digest::MD5 is about 200x faster than Digest::CRC on my PC.
Another Inkster turd...