Alternatives to rand()

By Dana Jacobsen on April 17, 2017 9:34 PM

I recently added some functionality for random number generation to my modules, which led me on a digression about rand(). This is a short look at some modules for getting random floating point values. A later one will look at random ints and bytes.

Modules for rand()
Module Algorithm	Stat Quality	Interval	Bits in output	Auto Seed	Predictable	Speed
CORE::rand drand48	Bad	[0,1)	48	OK	Yes	29000 k/s
Math::Random::MTwist MT XS	OK	[0,1)	53	OK	Yes	14000 k/s
ntheory ChaCha20 XS	Good	[0,1)	53-113	Full	No	12000 k/s
Math::Prime::Util::GMP ISAAC XS	Good	[0,1)	53-64	Good	No	14000 k/s
Crypt::PRNG Fortuna XS	Good	[0,1)	53	Good	No	700 k/s
Math::Random::MT::Auto MT XS	OK	[0,1)	52	Full	Yes	5000 k/s
Math::Random::MT MT XS	OK	[0,1)	32	rand()	Yes	2200 k/s
Math::Random::Secure Math::Random::ISAAC	Good	[0,1)	32	Full	No	460 k/s
Math::Random::ISAAC ISAAC XS or PP	Good	[0,1]	32	None	No	5700 k/s 480 k/s
Math::Random::Xorshift Xorshift XS	OK	[0,1]	32	Time	Yes	16000 k/s

Some things we generally want from our random number generators:

Statistically random. Any good CSPRNG will qualify, as will a much faster modern PRNG such as PCG or Xoroshiro128+. The Mersenne Twister is quite good though inferior to the newer choices in a few ways. Xorshift has been found to not be very good but the latest in its family, Xoroshiro128+ is considered excellent. drand48, used by Perl for CORE::rand since 5.19.4, has numerous issues. It is, however, better than the crap-shoot we had in older Perls. Both are listed as having "serious defects" in Jones' guide.

Consistent semantics. rand() returns a uniform floating point number in the half-open interval [0,1). This means we can get 0 but will not every get a 1. If given an argument, then our range goes up to but not including it -- we will get a random representable real from 0 up to but not including the number. This makes things like int(rand(6)) return 0-5 and not occasionally give 6. This also lets us combine adjacent intervals knowing there is no overlap. Two of the modules get this wrong. This has nothing to do with the underlying generator, but with the method chosen to create the return value.

Quality outputs. An NV in Perl is, by default, a double which can store up to 53 bits in the range [0,1). Some modules only fill the result with 32 bits. While this is fine for rolling dice, it won't give good results in some other scenarios. A Perl build with long double support may have 64 bits of mantissa, and with Quadmath support in 5.22 and later we get a full 113 bits to fill. I decided in my module that the best solution was to fill up the NV, giving them the maximum number of bits. It's no extra work for double and most long double builds. It is a little slower for quadmath, but clearly the customer wants lots of precision so we should honor that. I think anything less than 52 bits is substandard.

Good seeding available. Some modules strive to seed with an O/S entropy source including Windows and other fallbacks if /dev/urandom isn't available. Perl's internal seed function tries to use /dev/urandom as well as a bit of random process state. Its main issue is that it only uses 32 bits. A couple modules use the current time, which is generally not a good method. Math::Random::ISAAC punts the problem to the caller on the theory that no seeding is better than bad seeding, so modules using it need a seeding layer. There are some subtleties in seeding ISAAC that some ignore.

Repeatable. For testing or simulations we need to be able to repeat the result, therefore we have to be able to manually seed. All the modules except Crypt::PRNG support this. For true cryptographic use we really ought to have a forced-system-entropy-seed mode plus periodic reseeding. Crypt::PRNG operates in this way with no ability for user seeding.

Unpredictibility. For many uses, such as simulations, this isn't necessary at all. For others it is absolutely critical (breaking online Poker in 1999). This is one of the things that separates CSPRNGs from their lesser (but faster) brethren. It is analagous to using a cryptographic hash (e.g. SHA2) vs. generic hash (e.g. SipHash). The former has stronger guarantees but is often an order of magnitude or more slower. For drand48 or Xorshift, after seeing a single output we know every number that will come out in the future as well as all the previous outputs. With Mersenne Twister it takes 624 results. Methods like PCG and Xoroshiro128+ are much harder to predict though it is not impossible. This feature is a CSPRNG requirement so ISAAC, Salsa20, ChaCha20, and Fortuna completely meet it.

Performant. All other things being equal, we'd like it to be fast. A vast amount of time can be spent in the Perl / Module interface, making some of the solutions much slower than the underlying algorithm requires. Even when the interface is well optimized, it masks a lot of the difference, allowing algorithms like ChaCha20 and ISAAC to have not-dissimilar speeds from some fast regular PRNGs and substantially outperform much faster underlying algorithms in some modules. With a bulk interface such as random_bytes() things look slightly different. Note that the worst performing solution is still 100,000 calls per second, so this may be moot for your application if you just need a few numbers.

CORE::rand. Very fast. Separate context for each thread. 32-bit seed, 48-bit state. Very poor quality -- for example the 48th bit will always alternate 1,0,1,0,1,0,... while the 47th goes 0,0,1,1,0,0,1,1,... etc. The bottom bits are not at all random, which is a fundamental property of the type of generator using a non-prime modulus (2^48 in this case). It would be very interesting to see some alternatives (e.g. PCG and Xoroshiro128+) in the Perl core, similar to what we do with hash functions.

Math::Random::MTwist. Excellent alternative to core. It's fast and good quality. Random quality is what one would expect from Mersenne Twister. Lots of options. Threading should be done with OO interface.

ntheory. My module, which I added a ChaCha20 CSPRNG to. It was originally intended for speeding up large random prime construction vs. using Bytes::Random::Secure, but it is useful for a standard rand interface as well (srand, rand, irand, irand64, random_bytes, plus some more). ChaCha20 is the CSPRNG which is being used in new arc4random functions in *BSD, is the chosen method for /dev/urandom in Linux 4.18, is used in some new TLS methods, and is increasingly popular as a CSPRNG or stream cipher. The standard implementation isn't as fast as the ISAAC core, but there are AVX2 implementations that are a lot faster. Performance here doesn't suffer much vs. other alternatives, and it's in the top performance tier (albeit at the bottom of the leaders, but 2x faster than many other modules). Seeds at startup using a number of possible sources, and allows easy reseeding with a decent amount of system entropy. It also contains a full ChaCha20 implementation in Perl, as well as matching XS and Perl implementations of ISAAC, though using ISAAC is a build-time configuration option likely to be removed. Uses a single state and mutex lock for threads, which will be replaced by per-thread state in a later release.

Math::Prime::Util::GMP. Uses ISAAC in XS and runs quite a bit faster than the other ISAAC modules. Seeds at startup using /dev/urandom if possible. It wasn't really intended to be a primary module for this work, but it's an interesting comparison with the other ISAAC modules, especially the performance. No threading support.

Crypt::PRNG. An API for LibTomCrypt and part of the large CryptX package. It has a few choices, with the default being Fortuna -- an entropy pool framework with multiple entropy input streams, using AES in counter mode for the actual stream. Very high quality results, well done interface. The only downside is the slow performance, which is typical of the AES-CTR algorithm (when implemented without specific hardware support).

Math::Random::MT::AutoNot a bad module, but it seems to be superceded by MTwist, in that the latter is over 2x faster and doesn't seem to be missing many (any?) features.

Math::Random::MT. An earlier module than the above. Simpler, slower, and seeds from CORE::rand. Only allows 32-bits of state to be set by the user. Only fills returned doubles with 32-bits.

Math::Random::Secure. Gives a (srand,irand,rand) interface to Math::Random::ISAAC, including proper seeding. Large dependency list. Only fills returned doubles with 32-bits. Very slow, and even slower if Math::Random::ISAAC::XS isn't installed.

Math::Random::ISAAC. The first pure Perl implementation of ISAAC, widely copied or used by other modules. There is also an XS back end. Creates double output for rand() by dividing a random 32-bit int by 2^32-1 which makes the non-standard interval [0,1] and only uses 32-bits. No auto seeding is done.

Math::Random::Xorshift. A thin wrapper around Marsaglia's 2003 Xorshift PRNG. As the module author points out, this has some statistical flaws, though the later xorshift* algorithm fixed most of them and xoroshiro128+ is significantly better. Per thread context. Creates double output for rand() by dividing a random 32-bit int by 2^32-1 which makes the non-standard interval [0,1] and only uses 32-bits. Seeds with time. The primary advantage is that it runs faster than the other modules.

7 comments

Tagged as:

drand, drand48, entropy, math, mersenne, rand, random

7 Comments

Peter Rabbitson | April 18, 2017 9:55 AM | Reply

What about actually reading from /dev/urandom when it is available? Shouldn't it theoretically beat all of the above both in terms of quality and speed?

Dana Jacobsen replied to comment from Peter Rabbitson | April 18, 2017 2:50 PM | Reply

Peter, some modules offer this (ntheory, Crypt::Random, Crypt::Urandom) but none of them do a rand() style interface to it. It would be easy enough to create. On my Macbook, bulk reads from /dev/urandom run at 12 MB/s, which would make it quite slow compared to most of these modules. It is very system dependent -- my Fedora 20 desktop runs at a similarly slow 17 MB/s while the Fedora 23 desktop at 185 MB/s. A new CentOS 7 160-core Power 8 machine runs at 5 MB/s.

In raw speed the older /dev/urandom device is about 40x slower than standard ChaCha20. Fast PRNGs like we would expect in the Perl core would be another 10x faster.

In quality it should match the CSPRNG methods for underlying data, with the (big) exception that it's skipping a middle userspace layer. The new Linux and BSD /dev/urandom are nearly identical to the ChaCha20 code I use, though we differ in things like seeding and reseeding. For rand() calls without excessive overhead we'd need to either buffer or use a system call rather than literally opening the file each time.

Peter Rabbitson replied to comment from Dana Jacobsen | April 18, 2017 4:39 PM | Reply

Thanks for the explanation. I did not realize the difference in speed can be that wide: I thought all of the kernel-side PRNGs are at around 20MiB/s which was in line with most of your measurements.

3ee5cf54-f022-4a71-8666-3c2b5ee231dd [openid.stackexchange.com] | April 19, 2017 10:42 PM | Reply

The only downside is the slow performance, which is typical of the AES-CTR algorithm.

It shouldn't be on semi-recent hardware (which has AES support). On my not-so-recent box, openssl speed -evp chacha20 gets 1.6GB/sec with big blocks, but … -evp aes-128-ctr gets 5.1GB/sec. Even aes-256-ctr gets 3.6GB/sec.

I'm not sure those are the exact right EVP arguments to give it to test, but the presence of dedicated AES hardware makes AES the fastest option, almost always.

Dana Jacobsen | April 20, 2017 2:00 AM | Reply

Thanks for the note. You are correct that if run on machines with AES support (which is a *lot*, and OpenSSL supports a huge number of them) and if the software supports it, then indeed it's very fast. I added a parenthetical note.

None of the Perl modules do this. Crypt::Rijndael's AES C code is the standard reference C code. CryptX uses LibTomCrypt which is based on the original reference code. So they are not very fast (relative to other PRNGs). Fortuna also includes running SHA2, which further slows down the output. Crypt::PRNG (Fortuna) runs at 0.14 GB/s on this machine, and a naive use of Crypt::Rijndael runs at 0.07 GB/s. ChaCha20 (standard C) runs at 0.394 GB/s, while an AVX2 version found on github runs at 2.1 GB/s.

The ChaCha20 implementation I wrote is bog standard from the spec. It runs very slightly faster than OpenBSD's implementation, but 5x slower than the 'chacha-opt' AVX2 version. OpenSSL includes asm, SSE2, SSE3, AVX2, and AVS512 (!) accelerations.

Yuki Kimoto | April 20, 2017 8:46 AM | Reply

This comparison is very easy to understand.

lowjoe | September 22, 2020 10:37 PM | Reply

Math::Random::MTwist is now only available from Backpan. It disappeared without any notice.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Dana Jacobsen

I've been using Perl since 1991 (4.010 or so). At some point I realized that while I have used many languages, I enjoy using Perl more than any others.

More info »

Dana Jacobsen