Perl 5 Porters Weekly: November 26-December 2, 2012

Welcome to Perl 5 Porters Weekly, a summary of the email traffic of the perl5-porters email list.

Topics this week include:

  • rand() on Windows only uses 15 bits of entropy
  • RFC: Removing several undocumented functions from the Perl core
  • 5.18 VT in \s
  • Comment period extended for Unicode's changing some common characters from Punctuation to Symbol

rand() on Windows only uses 15 bits of entropy

If you ran this program on your system what would you expect the results to be?

my %nums;
my $cnt = 0;
foreach my $i (1 .. 1_000_000) {
    my $num = rand;
    $cnt++ if ($nums{$num});
    $nums{$num} = 1;
}
print "$cnt out of 1,000,000\n";

If you run it on Windows, you might be surprised to discover that it will print 967232 out of 1,000,000 every time. That means there's only 32,768 possible floats between 0 .. 1 that rand() will generate on that platform. Other platforms don't seem to be affected by the lack of entropy in rand().

This comes from a RT ticket opened by Brendan Byrd. There was a bunch of discussion about this. Ricardo Signes indicated that he's very interested in seeing an implementation of the Mersenne Twister algorithm implemented, which follows some suggestions by Nicholas Clark.

Read the thread

RFC: Removing several undocumented functions from the Perl core

Karl Williamson suggested that several undocumented functions in Perl's API be removed for 5.18. They are:

  • is_uni_idfirst_lc
  • is_utf8_idfirst
  • is_utf8_xidfirst
  • is_utf8_idcont

Nicholas Clark suggested the calls be flagged as deprecated in 5.18 and removed in 5.20.

Read the thread

5.18 VT in \s

Karl Williamson suggested that the vertical tab character, added experimentally to m/\s/ in 5.17 be formally released in 5.18. He notes that this addition caused no perceptable failures in CPAN smoke testing and doesn't expect any failures from releasing it into the wild, partially because the vertical tab character is so rarely used.

For the full context about vertical tab's inclusion, read the thread.

Read the thread

Comment period extended for Unicode's changing some common characters from Punctuation to Symbol

Completing his trifeca on this week's summary, Karl Williamson noted that Unicode has extended its comment period to January 21, 2013 for changing some common characters from the "punctuation" class to the "symbol" class.

You can make comments by visiting here.

David Golden asked in a followup the following questions:

* How can we better document (if we're not) the forward compatibility
  risks inherent in using Unicode character classes?

* How can we let programs introspect the version of Unicode that Perl
  provides?

* Is it possible to make any of this pluggable, so a program could
  specify which version of Unicode classes they want to use?

Karl replied that it's not pluggable but it is possible to compile any perl that supports Unicode with a version specific character database by following the instructions in README.perl inside of lib/unicore if you wanted to downgrade a particular perl for some reason.

He also notes that these proposed changes were included last summer experimentally and only broke 1 CPAN module. He says that the [[:punct:]] class matches both the Unicode ASCII-range symbols plus punctuation, so most Perl programs will not notice any of these changes.

With respect to the third question above about Unicode version inspection (programmatically), Karl wrote:

This information has long been available through Unicode::UCD::UnicodeVersion()

Read the thread

2 Comments

I would love it if we would move rand() to a standard RNG like MT or ISAAC, and then configuration would handle how to do a good default seed (O/S specific). Keep the srand interface. Randbits could be 53 everywhere.

I'd like to see an irand(n) function also, that returned a UV between 0 and n inclusive, where n could be as large as ~0. That's a bit more controversial.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.2849

I agree with a lot of the posters that getting this stuff right is surprisingly full of traps for the unwary.

Dana: you should post to p5p. Commenting here is a good way to ensure no one who actually works on perl will see it (it’s unlikely to be read by more than about 4 people in the first place).

About Perl 5 Porters Summaries

user-pic Weekly p5p summaries