Random libraries and benchmarks
Benchmarks of random libraries, or how fast can we generate random numbers in Haskell. In this post I will share with you my investigation into performance of libraries that can be used for random number generation. Besides performance, I will provide a brief introduction on how to use those libraries and what are their conceptual differences.
Categories of randomness
When people talk about random values they immediately think of two possible categories:
Cryptographically secure pseudo random number generators, which are provided by such libraries as
crypto-apiand are used for things like encryption. These are NOT the kind of libraries I will be talking about in this post. They do deserve their own separate article and maybe some other Haskeller with an interest in that domain can whip one up.
Random number generators (RNGs) that are used for modeling, simulations, generative testing, gaming, blinking lights, what have you. Such generators are exactly the kind that interest us today.
The latter category employs various algorithms to achieve different degree of randomness. Some produce better values than the others, which might make them differ in unpredictability, longer or shorter periods, or even suitability for certain data types but not the others. They might pass or fail diehard tests and have all sorts of other properties that such algorithms are expected to possess. You can study each individual one at your leisure, this is not gonna be the point of this post. It is important, though, that you do understand the quality of the algorithm being used before jumping into conclusions about the performance of its implementation. Just to give an example a Xorshift algorithm is known to be fast, but unreliable, when compared to others.
One other crucial point I would like to make is that I will omit any subjective remarks about the quality of the code in all libraries discussed as well as the correctness of implementation of claimed algorithms. In other words, even if a package is a total phony, all I care about today is performance, so you be the judge of the rest.
Plethora of libraries
There are quite a few non-cryptographic random generators available on Hackage and Stackage, which implement different algorithms with similar APIs. All of them can be placed into four categories:
This is the kind of generator that produces a new random value together with another instance of the same generator, which can further be used to generate another value (possibly of different type) and another generator… and so on and so forth, until you get your initial generator and an extremely long cycle starts repeating itself.
Great thing about such generators is that not only they can also be used in any pure
functions, but they are also 100% reproducible, granted that the initial generator was
created from the same seed. This great property comes from the fact that they rely on a
small immutable state that can simply be copied with slight changes, thus making them pure
and "stateless". Of course you can thread a generator in a
StateT like monad or store it
IORef, but that is not a requirement.
The most popular package that is equipped with such functionality in Haskell is
random. I think it is still popular only
because it was the first of its kind in Haskell and is the defacto library when people are
learning about pure random number generation. Keep reading further though to realize, if
you haven't yet, that there are much better packages available. Nevertheless let's look at
λ> import System.Random λ> g0 <- newStdGen λ> let (x, g1) = next g0 λ> let (y, _) = next g1 λ> print y 608110287 λ> print x 1021702771 λ> print $ fst $ next g0 1021702771
Above example depicts a critical property of pure generators: applying
always produce the same value.
Splittable pure RNGs
Just as before, we can generate values in a pure setting, but now we can also split a generator into two new distinct ones. This is a great property to have in a pure RNG, because it allows us to produce deterministic sequences of random data in a concurrent or even distributed environment. Let's see a splittable generator in action:
λ> import System.Random λ> import System.Random.SplitMix (mkSMGen) λ> let g0 = mkSMGen 123456789 λ> let (g01, g02) = split g0 λ> :i randomR class Random a where randomR :: RandomGen g => (a, a) -> g -> (a, g) ... -- Defined in ‘System.Random’ λ> fst $ randomR (0, 1000) g01 :: Word 553 λ> fst $ randomR (0, 1000) g02 :: Word 410 λ> fst $ randomR (0, 1000) g01 :: Word 553
split function allows us to create two new distinct generators from
g0, both of which
we could pass to two separate pure functions further down in the call tree. Moreover,
nothing prevents us from passing both of those
g02 generators across the
wire to some other two machines. This would, as with any other RNG, let everyone know the
next random value each of the generators could produce. The power comes from the fact that
when paired with a pure functions, that can use those generators, all three of the
machines (including the sender) will be able to know deterministically the full
sequence of values each of them can produce. Try the above code on your computer and you will
get the same values.
We are going to exploit the splittable property of generators to produce deterministic sequences of random numbers in parallel on multiple cores.
Please note that I used the same
split functions on the
splitmix. That's right, all of the pure generators in Haskell can use the same
random package, more specifically
Random classes. I
think that is another explanation for such popularity of that package.
Not all pure generators are splittable though, for example
package does not fall into that category, despite being pure:
λ> import System.Random.Mersenne.Pure64 λ> import System.Random λ> mtGen <- newPureMT λ> fst $ randomR (0, 1000) mtGen :: Word 939 λ> split mtGen *** Exception: System.Random.Mersenne.Pure: unable to split the mersenne twister
One might argue, that it is always possible to split a generator simply by creating a random value and using it as a seed for the second generator. Such an approach might even be OK if you don't care about how good your random numbers are. In fact, I've personally abused a similar approach a few times during random property testing, but from what I gather, unless the algorithm directly supports splitting, you will not get the same quality of random numbers as it is actually promised by the algorithm.
Generators in the previous section had the luxury of being very small in size, despite being
mathematically complex. This fact allowed them to not rely on mutation to perform
efficiently, which in turn made them pure. RNGs in this section have to carry a large
mutable array in their state around, a portion of which is mutated each time a new random
value is generated. One such popular package is
mwc-random. Because there is mutation
happens behind the scenes, we have to live inside of
or some transformer on top of one of those two that implements a
instance. Here is how it works in ghci (which is inherently
λ> import System.Random.MWC λ> gen <- createSystemRandom λ> uniformR (0, 1000) gen :: IO Word 602 λ> uniformR (0, 1000) gen :: IO Word 963
As you can see, despite that we pass the same generator as an argument to
get different values back. This is because our generator is mutated behind the scenes in
One cool thing is that regardless of the fact that out generator is inherently effectful, we
can still encapsulate it in a pure computation with
ST monad. All we need to do is pass the
seed for a previously created generator and we will get a reproducible result. Here is an
example of a randomly generated list of values within a pure function that uses a safe
mutable interface underneath:
import Control.Monad (replicateM) import Control.Monad.ST (runST) import System.Random.MWC (restore, uniform) randomList :: Variate a => Seed -> Int -> [a] randomList seed n = runST $ do gen <- restore seed replicateM n (uniform gen)
λ> sysSeed <- save =<< createSystemRandom λ> randomList sysSeed 4 :: [Float] [0.5644637,0.9203384,0.30062655,8.7106675e-2] λ> randomList sysSeed 5 :: [Float] [0.5644637,0.9203384,0.30062655,8.7106675e-2,0.5830376]
Here we can observe that the same seed results in exactly the same values, as one would expect from a pure computation. This approach does not come without a drawback: you don't want to generate small amounts of data this way, because saving/restoring the seed is an expensive operation, when compared to a single random number generation.
If you are running your computation in
IO, you might be tempted to use the same
generator between multiple threads. Don't! There is no locking in place to prevent
corruption of the state. Not all is lost though. The only thing that is required for
generation of random values on multiple cores is the preparation of separate instances of
RNGs prior to parallelization. Even though they are stateful, they can safely function
independently, which is even better than locking, since it avoids contention for the same
resources. This will be important in our parallel benchmarks.
Utilizers of RNGs
The last category is the most diverse one, it is all the libraries out there that use uniform random number generators to produce some other random data:
- Wrappers (or built-in) that have functionality for producing a whole variety of
- Wrappers around RNGs that provide different interface, for example monadic:
- Libraries that shuffle elements of data structures by using RNGs:
- Libraries that do randomized property testing:
- Libraries that generate realistic looking random data:
- Whatever else you can think of that uses random numbers
Point is, once we have a uniform random number generator, we can simulate all other random stuff. Maybe someday I'll get a chance to compare the performance of libraries that produce numbers in other distributions, but not today.
Special note on system random generators, such as
CryptoAPI etc. Those RNGs are usable over FFI from Haskell, and if my
goal was to compare implementation across languages, they would serve nicely as a
baseline. That's not our purpose today, so we'll skip the system generators, especially
since the bindings I've tried were pretty slow.
Earlier this year I stumbled upon Ben Gamari's question on StackOverflow about trying to
generate an array with random numbers in parallel, which I was really happy to
since I had the exact solution for it in
massiv. Naturally, I wanted to benchmark my
solution. Once I wrote the benchmarks, my curiosity took me deeper into the realm of
randomness. So in a way, that question is the cause for this blog post. Thanks, Ben! ;)
In order to properly benchmark random number generators I could have went in a few
directions. One would be creating manual loops that generate a whole lot of random numbers
and discard them right away. Alternatively, I could have generated lists of random
numbers, like in one of the examples earlier. Instead, I went for unboxed arrays for a few
important reasons. First of all, I already had an efficient solution for generating arrays
using RNGs sequentially and in parallel with
functions using pure and stateful generators respectfully. Secondly, a flat unboxed array
is the most efficient way to store large number of values, be they random or not. Most
importantly though, as long as we are using the same approach for benchmarking all of the
libraries, we are still being fair, regardless that using some other data structure or
none at all could have produced slightly different results.
Both of these functions are pretty well documented, but to give you the gist of it: the
first one allows generating numbers purely, relying on splitability for parallelization,
while the second one runs in
IO and pre-creates separate generators for each worker
thread in order to do proper parallelization.
Libraries with pure RNGs
A list of libraries, which I was able to find that provide a pure interface for random number generation, together with their versions that were used in the benchmarks below. Libraries are sorted in the order of their appearance (i.e. initial version uploaded on hackage).
|Package||First appeared on||Last released on||Latest release|
Every single one of the above libraries provides an instance for
class, so we are going to try and use it.
Xorshift128Plus did not have such
an instance implemented, but they had the right helper functions to make it possible for
me to provide one.
We will start by disqualifying one package, namely
Random123, and put it on a wall of
shame as being by far the slowest of them all (
random package is only listed here as a
Here is a benchmark for one of
next :: g -> (Int, g) which generates a random
Int and a new generator. Sequential and parallel generations can be distinguished by
You might notice that
Xorshift128Plus do not have parallel benchmarks. That is because, as discussed earlier,
they do not provide any facilities for splitting the generator, which disqualifies them
from concurrent deterministic random number generators.
Also, note that
pcg-random has two different pure implementations one regular and
another one experimental called "fast", both of which are written in pure Haskell. It also
has stateful implementation of regular and "fast", which we are gonna skip for now and
look at in the next section.
Now, I'd like to step back and try to understand what was exactly that we just
benchmarked. In particular, what was the possible value of
Int we were getting for each
of the generators. To answer that question, we'll have to look at another
genRange :: g -> (Int, Int). I made a little helper function that goes through
all generators and prints the ranges for possible values those generators could produce:
λ> printRanges [ 1, 2147483562] - random:System.Random.StdGen [-9223372036854775808, 9223372036854775807] - mersenne-random-pure64:System.Random.Mersenne.Pure64.MTGen [ -2147483648, 2147483647] - xorshift:Random.Xorshift.Int32.Xorshift32 [-9223372036854775808, 9223372036854775807] - xorshift:Random.Xorshift.Int64.Xorshift64 [ 0, 4294967295] - AC-Random:Random.MWC.Primitive.Seed [ 0, 2147483562] - tf-random:System.Random.TF.TFGen [-9223372036854775808, 9223372036854775807] - pcg-random:System.Random.PCG.Pure.SetSeq [-9223372036854775808, 9223372036854775807] - pcg-random:System.Random.PCG.Fast.Pure.FrozenGen [-9223372036854775808, 9223372036854775807] - Xorshift128Plus:System.Random.Xorshift128Plus.Gen [ -2147483648, 2147483647] - pcgen:Data.PCGen.PCGen [-9223372036854775808, 9223372036854775807] - splitmix:Data.SplitMix32.SMGen [-9223372036854775808, 9223372036854775807] - splitmix:Data.SplitMix.SMGen
When we look at the ranges, we might suspect that the earlier benchmark did not treat
these generators equally fair, and that suspicion would be totally justified. On the other
hand though, those ranges give us a better idea about the performance of a particular
generator. For example
pcgen can generate in one go a 32bit random
number, which means that in order to generate a uniformly distributed
Int64, all we'd
have to do is call the generator twice. So, let's actually be fair and benchmark
generation of the full 64 bits.
Good thing for us is that
random package is versatile enough and provides us with a
type class, which is designed to use any of the above generators in order to produce a
uniformly distributed value in the full range of a type that has such instance
implemented. This means that all we need to do is call
random :: RandomGen g => g -> (a,
function and we will get a uniformly distributed value in a predefined range, which for
bounded types like
Word64 it will be the full range.
The only change I made to the above benchmarks was I swapped
next :: RandomGen g => g ->
(Int, g) function for a type restricted
random :: RandomGen g => g -> (Word64, g):
Wait! What? What happened there? All of a sudden every single one of our generators
started to perform x10 - x50 times slower. As I mentioned just a couple of paragraphs ago,
it would have been expected for some generators to get a factor of x2 slow down, since
they can't generate 64bits in one go. Therefore the results that we got can't possibly be
explained by this, so what is really causing such a huge regression? As it turns out,
are oblivious to the concrete type that they are generating and all of the integral
numbers go through the arbitrary precision
Integer in order to produce the value in a
desired range. Simply put, whatever number of bits you want to generate, be it 8 or 64 it
will be just as slow.
I personally wouldn't care much if it was some sporadically used function, but I can't
emphasize enough the impact of this. I see both of these functions
used extensively in the Haskell community. I will even go on a limb and say that we all
use those two indirectly on a daily basis. Just to give you an idea, both
Hedgehog uses those functions, which means everyone uses them, since we all writes
generative test, right? ;)
The worst implication of the ridiculously slow implementation of
randomR is, that
regardless of how fast your generator really is, you will always get terrible
performance! This fact is clearly depicted in the benchmarks above.
For the sake of argument, let's take a look at what performance of producing a
can be for each of the generators, if we bypass the
Random class. Some packages provide
built-in functions for generating
Word64, which I happily used, but for the ones that do
not have such functionality out of the box, I've written the two functions below, choice
of which one depended on the generator's range:
random32to64 :: RandomGen g => g -> (Word64, g) random32to64 g0 = ((fromIntegral hi `unsafeShiftL` 32) .|. fromIntegral lo, g2) where (hi, g1) = next g0 (lo, g2) = next g1 random64to64 :: RandomGen g => g -> (Word64, g) random64to64 g0 = (fromIntegral i64, g1) where (i64, g1) = next g0
random generate slightly less than 31 bits of entropy in one
iteration, instead of 32 or 64 as it is with others, so the simple approach from above
will not work, therefore I will just exclude
tf-random and will keep
implementation as the baseline for comparison:
It is clear now as night and day, that
random function is the culprit and should be
either fixed or avoided when possible.
In order to get a better picture of the libraries that do perform well I need to get rid
of the slow ones, therefore I am now forced to drop
splitmix (only the 32bit version). Note, that there is a disclaimer in the
haddock not to use the
SplitMix32, which I included thus far for completeness only.
From a quick look at all the benchmarks above, the clear winner is
splitmix. Now let's move towards stateful
RNGs and see how they compare with pure generators.
Libraries with stateful RNGs
Below is the list of packages that I was able to find, which provide RNGs that depend on a mutable state. Same as before, packages are in the order of appearance and the versions benchmarked are included as well:
|Package||First appeared on||Last released on||Latest release|
In a very similar fashion as before I ran sequential and parallel generations of random arrays, except now with stateful generators.
The intriguing part of this chart for me was the fact that stateful implementations of
mersenne-random did not parallelize well at all. My gut feeling
suggests that it has something to do with
FFI bindings and some peculiarities of
C implementations. Although
sfmt performed pretty well both sequentially
and in parallel, despite being pretty much all written in
In the end I would like to combine the finalists of this performance race from all of the three categories: pure, pure splittable and stateful. Packages are now sorted by the median of duration of their runtimes:
An extra bonus benchmarks for the libraries that could efficiently generate
function first and as I suspected it performs just as awfully for
Doubles as for
integral types, therefore amongst pure generators I kept only the ones that provided
Double out of the box.
I started with a goal of finding a goto library for myself to use, whenever I would need random numbers. I certainly found what I was looking for and since all of us need a bit of randomness once in a while, I thought I'd share these findings with all of other Haskellers out there. Hope you find them useful.
In the process of researching and experimenting I made a few realizations for myself, thus would like to share those with you as well, despite that they might be a bit subjective:
splitmixpleasantly surprised me with its performance. Great job, Oleg! Parallelization of the algorithm isn't that great, only a factor of x2 on a quad core CPU, but I suspect it might be because it is getting bounded by the speed of my memory, rather than the algorithm or the implementation. This probably means that it is already pretty close to the optimal performance, although, as noted in the haddock, a possibility of future addition of SIMD to GHC (and
splitmix) could give it a slight boost.
sfmtturned out to be a gem I never knew about. Underneath it relies on implementation in
Cand SIMD instructions, so it might not be extremely portable, but that doesn't mean it can't be useful for a whole variety of projects, especially since it turned out to be the fastest stateful generator.
mwc-randomhas been around the block and is quite a popular library. It didn't turn out to be the fastest, but it still performs quite well and running in parallel improves the runtime significantly. Moreover, the interface is solid and provides ability to generate other distributions out of the box.
randomis the package I was quite disappointed in. I was more than sure that
StdGenwill have poor performance, but I did not expect
Randomclass instances to be so slow. I know that there was some effort in improving it, but as it turns out that effort was not only unfruitful, but it was also directed at the wrong problem. From the looks of it the true problem is not in the generator that is being used in the library, but in the way that all generators are being used. Of course, it doesn't mean the
StdGenis not a problem, it's just that problem today has an easy fix.
If it were up to me, here are the improvements I would make:
randompackage to depend on
splitmix, instead of the other way round. Get rid of the internal
StdGenimplementation in favor of
type StdGen = SMGen).
- Improve the implementation of each individual instance of
- Make it possible for implementers of custom generators to override default
implementations of individual primitive types (i.e. customizable
- Add an interface to
randompackage for stateful generators, which could be used by
mwc-randomand others alike.
Thank you. If you'd like to inspect the code and convince yourself of legitimacy of the discussed benchmarks by running them yourself or scrutinizing them some other way, they can be found in my haskell-benchmarks repo.