Random libraries and benchmarks
Benchmarks of random libraries, or how fast can we generate random numbers in Haskell. In this post I will share with you my investigation into performance of libraries that can be used for random number generation. Besides performance, I will provide a brief introduction on how to use those libraries and what are their conceptual differences.
Categories of randomness
When people talk about random values they immediately think of two possible categories:

Cryptographically secure pseudo random number generators, which are provided by such libraries as
cryptonite
orcryptoapi
and are used for things like encryption. These are NOT the kind of libraries I will be talking about in this post. They do deserve their own separate article and maybe some other Haskeller with an interest in that domain can whip one up. 
Random number generators (RNGs) that are used for modeling, simulations, generative testing, gaming, blinking lights, what have you. Such generators are exactly the kind that interest us today.
The latter category employs various algorithms to achieve different degree of randomness. Some produce better values than the others, which might make them differ in unpredictability, longer or shorter periods, or even suitability for certain data types but not the others. They might pass or fail diehard tests and have all sorts of other properties that such algorithms are expected to possess. You can study each individual one at your leisure, this is not gonna be the point of this post. It is important, though, that you do understand the quality of the algorithm being used before jumping into conclusions about the performance of its implementation. Just to give an example a Xorshift algorithm is known to be fast, but unreliable, when compared to others.
One other crucial point I would like to make is that I will omit any subjective remarks about the quality of the code in all libraries discussed as well as the correctness of implementation of claimed algorithms. In other words, even if a package is a total phony, all I care about today is performance, so you be the judge of the rest.
Plethora of libraries
There are quite a few noncryptographic random generators available on Hackage and Stackage, which implement different algorithms with similar APIs. All of them can be placed into four categories:
Pure RNGs
This is the kind of generator that produces a new random value together with another instance of the same generator, which can further be used to generate another value (possibly of different type) and another generator… and so on and so forth, until you get your initial generator and an extremely long cycle starts repeating itself.
Great thing about such generators is that not only they can also be used in any pure
functions, but they are also 100% reproducible, granted that the initial generator was
created from the same seed. This great property comes from the fact that they rely on a
small immutable state that can simply be copied with slight changes, thus making them pure
and "stateless". Of course you can thread a generator in a StateT
like monad or store it
in IORef
, but that is not a requirement.
The most popular package that is equipped with such functionality in Haskell is
random
. I think it is still popular only
because it was the first of its kind in Haskell and is the defacto library when people are
learning about pure random number generation. Keep reading further though to realize, if
you haven't yet, that there are much better packages available. Nevertheless let's look at
an example:
λ> import System.Random
λ> g0 < newStdGen
λ> let (x, g1) = next g0
λ> let (y, _) = next g1
λ> print y
608110287
λ> print x
1021702771
λ> print $ fst $ next g0
1021702771
Above example depicts a critical property of pure generators: applying next
to g0
will
always produce the same value.
Splittable pure RNGs
Just as before, we can generate values in a pure setting, but now we can also split a generator into two new distinct ones. This is a great property to have in a pure RNG, because it allows us to produce deterministic sequences of random data in a concurrent or even distributed environment. Let's see a splittable generator in action:
λ> import System.Random
λ> import System.Random.SplitMix (mkSMGen)
λ> let g0 = mkSMGen 123456789
λ> let (g01, g02) = split g0
λ> :i randomR
class Random a where
randomR :: RandomGen g => (a, a) > g > (a, g)
...
 Defined in ‘System.Random’
λ> fst $ randomR (0, 1000) g01 :: Word
553
λ> fst $ randomR (0, 1000) g02 :: Word
410
λ> fst $ randomR (0, 1000) g01 :: Word
553
split
function allows us to create two new distinct generators from g0
, both of which
we could pass to two separate pure functions further down in the call tree. Moreover,
nothing prevents us from passing both of those g01
and g02
generators across the
wire to some other two machines. This would, as with any other RNG, let everyone know the
next random value each of the generators could produce. The power comes from the fact that
when paired with a pure functions, that can use those generators, all three of the
machines (including the sender) will be able to know deterministically the full
sequence of values each of them can produce. Try the above code on your computer and you will
get the same values.
We are going to exploit the splittable property of generators to produce deterministic sequences of random numbers in parallel on multiple cores.
Please note that I used the same randomR
and split
functions on the SMGen
generator
from splitmix
. That's right, all of the pure generators in Haskell can use the same
interface from random
package, more specifically RandomGen
and Random
classes. I
think that is another explanation for such popularity of that package.
Not all pure generators are splittable though, for example mersennerandompure64
package does not fall into that category, despite being pure:
λ> import System.Random.Mersenne.Pure64
λ> import System.Random
λ> mtGen < newPureMT
λ> fst $ randomR (0, 1000) mtGen :: Word
939
λ> split mtGen
*** Exception: System.Random.Mersenne.Pure: unable to split the mersenne twister
One might argue, that it is always possible to split a generator simply by creating a random value and using it as a seed for the second generator. Such an approach might even be OK if you don't care about how good your random numbers are. In fact, I've personally abused a similar approach a few times during random property testing, but from what I gather, unless the algorithm directly supports splitting, you will not get the same quality of random numbers as it is actually promised by the algorithm.
Stateful RNGs
Generators in the previous section had the luxury of being very small in size, despite being
mathematically complex. This fact allowed them to not rely on mutation to perform
efficiently, which in turn made them pure. RNGs in this section have to carry a large
mutable array in their state around, a portion of which is mutated each time a new random
value is generated. One such popular package is
mwcrandom
. Because there is mutation
happens behind the scenes, we have to live inside of
IO
,
ST
or some transformer on top of one of those two that implements a
PrimMonad
instance. Here is how it works in ghci (which is inherently IO
):
λ> import System.Random.MWC
λ> gen < createSystemRandom
λ> uniformR (0, 1000) gen :: IO Word
602
λ> uniformR (0, 1000) gen :: IO Word
963
As you can see, despite that we pass the same generator as an argument to uniformR
, we
get different values back. This is because our generator is mutated behind the scenes in
the IO
monad.
One cool thing is that regardless of the fact that out generator is inherently effectful, we
can still encapsulate it in a pure computation with ST
monad. All we need to do is pass the
seed for a previously created generator and we will get a reproducible result. Here is an
example of a randomly generated list of values within a pure function that uses a safe
mutable interface underneath:
import Control.Monad (replicateM)
import Control.Monad.ST (runST)
import System.Random.MWC (restore, uniform)
randomList :: Variate a => Seed > Int > [a]
randomList seed n =
runST $ do
gen < restore seed
replicateM n (uniform gen)
λ> sysSeed < save =<< createSystemRandom
λ> randomList sysSeed 4 :: [Float]
[0.5644637,0.9203384,0.30062655,8.7106675e2]
λ> randomList sysSeed 5 :: [Float]
[0.5644637,0.9203384,0.30062655,8.7106675e2,0.5830376]
Here we can observe that the same seed results in exactly the same values, as one would expect from a pure computation. This approach does not come without a drawback: you don't want to generate small amounts of data this way, because saving/restoring the seed is an expensive operation, when compared to a single random number generation.
If you are running your computation in IO
, you might be tempted to use the same
generator between multiple threads. Don't! There is no locking in place to prevent
corruption of the state. Not all is lost though. The only thing that is required for
generation of random values on multiple cores is the preparation of separate instances of
RNGs prior to parallelization. Even though they are stateful, they can safely function
independently, which is even better than locking, since it avoids contention for the same
resources. This will be important in our parallel benchmarks.
Utilizers of RNGs
The last category is the most diverse one, it is all the libraries out there that use uniform random number generators to produce some other random data:
 Wrappers (or builtin) that have functionality for producing a whole variety of
distributions:
randomfu
,mwcrandom
,statistics
, etc.  Wrappers around RNGs that provide different interface, for example monadic:
MonadRandom
,randomsource
,  Libraries that shuffle elements of data structures by using RNGs:
randomshuffle
 Libraries that do randomized property testing:
QuickCheck
,hendgehog
,genvalidity
, etc.  Libraries that generate realistic looking random data:
fakedata
,fake
,lipsumgen
, etc.  Whatever else you can think of that uses random numbers
Point is, once we have a uniform random number generator, we can simulate all other random stuff. Maybe someday I'll get a chance to compare the performance of libraries that produce numbers in other distributions, but not today.
Benchmarks
Special note on system random generators, such as getrandom()
, /dev/random
,
/dev/urandom
, CryptoAPI
etc. Those RNGs are usable over FFI from Haskell, and if my
goal was to compare implementation across languages, they would serve nicely as a
baseline. That's not our purpose today, so we'll skip the system generators, especially
since the bindings I've tried were pretty slow.
Motivation
Earlier this year I stumbled upon Ben Gamari's question on StackOverflow about trying to
generate an array with random numbers in parallel, which I was really happy to
answer,
since I had the exact solution for it in
massiv
. Naturally, I wanted to benchmark my
solution. Once I wrote the benchmarks, my curiosity took me deeper into the realm of
randomness. So in a way, that question is the cause for this blog post. Thanks, Ben! ;)
Method
In order to properly benchmark random number generators I could have went in a few
directions. One would be creating manual loops that generate a whole lot of random numbers
and discard them right away. Alternatively, I could have generated lists of random
numbers, like in one of the examples earlier. Instead, I went for unboxed arrays for a few
important reasons. First of all, I already had an efficient solution for generating arrays
using RNGs sequentially and in parallel with
randomArray
and
randomArrayWS
functions using pure and stateful generators respectfully. Secondly, a flat unboxed array
is the most efficient way to store large number of values, be they random or not. Most
importantly though, as long as we are using the same approach for benchmarking all of the
libraries, we are still being fair, regardless that using some other data structure or
none at all could have produced slightly different results.
Both of these functions are pretty well documented, but to give you the gist of it: the
first one allows generating numbers purely, relying on splitability for parallelization,
while the second one runs in IO
and precreates separate generators for each worker
thread in order to do proper parallelization.
Libraries with pure RNGs
A list of libraries, which I was able to find that provide a pure interface for random number generation, together with their versions that were used in the benchmarks below. Libraries are sorted in the order of their appearance (i.e. initial version uploaded on hackage).
Package  First appeared on  Last released on  Latest release 
random 
20071103  20140916  random == 1.1 
mersennerandompure64 
20080128  20160829  mersennerandompure64 == 0.2.2.0 
xorshift 
20110105  20110411  xorshift == 2 
ACRandom 
20110825  20110825  ACRandom == 0.1 
Random123 
20130504  20150320  Random123 == 0.2.0 
tfrandom 
20130918  20140409  tfrandom == 0.5 
pcgrandom 
20141215  20190518  pcgrandom == 0.1.3.6 
Xorshift128Plus 
20150414  20150414  Xorshift128Plus == 0.1.0.1 
pcgen 
20170429  20170705  pcgen == 2.0.1 
splitmix 
20170630  20190730  splitmix == 0.0.3 
Every single one of the above libraries provides an instance for
RandomGen
class, so we are going to try and use it. ACRandom
and Xorshift128Plus
did not have such
an instance implemented, but they had the right helper functions to make it possible for
me to provide one.
We will start by disqualifying one package, namely Random123
, and put it on a wall of
shame as being by far the slowest of them all (random
package is only listed here as a
baseline):
Here is a benchmark for one of RandomGen
functions next :: g > (Int, g)
which generates a random
Int
and a new generator. Sequential and parallel generations can be distinguished by
different colors:
You might notice that mersennerandompure64
, xorshift
, ACRandom
and
Xorshift128Plus
do not have parallel benchmarks. That is because, as discussed earlier,
they do not provide any facilities for splitting the generator, which disqualifies them
from concurrent deterministic random number generators.
Also, note that pcgrandom
has two different pure implementations one regular and
another one experimental called "fast", both of which are written in pure Haskell. It also
has stateful implementation of regular and "fast", which we are gonna skip for now and
look at in the next section.
Now, I'd like to step back and try to understand what was exactly that we just
benchmarked. In particular, what was the possible value of Int
we were getting for each
of the generators. To answer that question, we'll have to look at another RandomGen
's
function: genRange :: g > (Int, Int)
. I made a little helper function that goes through
all generators and prints the ranges for possible values those generators could produce:
λ> printRanges
[ 1, 2147483562]  random:System.Random.StdGen
[9223372036854775808, 9223372036854775807]  mersennerandompure64:System.Random.Mersenne.Pure64.MTGen
[ 2147483648, 2147483647]  xorshift:Random.Xorshift.Int32.Xorshift32
[9223372036854775808, 9223372036854775807]  xorshift:Random.Xorshift.Int64.Xorshift64
[ 0, 4294967295]  ACRandom:Random.MWC.Primitive.Seed
[ 0, 2147483562]  tfrandom:System.Random.TF.TFGen
[9223372036854775808, 9223372036854775807]  pcgrandom:System.Random.PCG.Pure.SetSeq
[9223372036854775808, 9223372036854775807]  pcgrandom:System.Random.PCG.Fast.Pure.FrozenGen
[9223372036854775808, 9223372036854775807]  Xorshift128Plus:System.Random.Xorshift128Plus.Gen
[ 2147483648, 2147483647]  pcgen:Data.PCGen.PCGen
[9223372036854775808, 9223372036854775807]  splitmix:Data.SplitMix32.SMGen
[9223372036854775808, 9223372036854775807]  splitmix:Data.SplitMix.SMGen
When we look at the ranges, we might suspect that the earlier benchmark did not treat
these generators equally fair, and that suspicion would be totally justified. On the other
hand though, those ranges give us a better idea about the performance of a particular
generator. For example ACrandom
or pcgen
can generate in one go a 32bit random
number, which means that in order to generate a uniformly distributed Int64
, all we'd
have to do is call the generator twice. So, let's actually be fair and benchmark
generation of the full 64 bits.
Good thing for us is that random
package is versatile enough and provides us with a
Random
type class, which is designed to use any of the above generators in order to produce a
uniformly distributed value in the full range of a type that has such instance
implemented. This means that all we need to do is call random :: RandomGen g => g > (a,
g)
function and we will get a uniformly distributed value in a predefined range, which for
bounded types like Word64
it will be the full range.
The only change I made to the above benchmarks was I swapped next :: RandomGen g => g >
(Int, g)
function for a type restricted random :: RandomGen g => g > (Word64, g)
:
Wait! What? What happened there? All of a sudden every single one of our generators
started to perform x10  x50 times slower. As I mentioned just a couple of paragraphs ago,
it would have been expected for some generators to get a factor of x2 slow down, since
they can't generate 64bits in one go. Therefore the results that we got can't possibly be
explained by this, so what is really causing such a huge regression? As it turns out,
randomR
and consequently
random
are oblivious to the concrete type that they are generating and all of the integral
numbers go through the arbitrary precision Integer
in order to produce the value in a
desired range. Simply put, whatever number of bits you want to generate, be it 8 or 64 it
will be just as slow.
I personally wouldn't care much if it was some sporadically used function, but I can't
emphasize enough the impact of this. I see both of these functions random
and randomR
used extensively in the Haskell community. I will even go on a limb and say that we all
use those two indirectly on a daily basis. Just to give you an idea, both QuickCheck
and
Hedgehog
uses those functions, which means everyone uses them, since we all writes
generative test, right? ;)
The worst implication of the ridiculously slow implementation of randomR
is, that
regardless of how fast your generator really is, you will always get terrible
performance! This fact is clearly depicted in the benchmarks above.
For the sake of argument, let's take a look at what performance of producing a Word64
can be for each of the generators, if we bypass the Random
class. Some packages provide
builtin functions for generating Word64
, which I happily used, but for the ones that do
not have such functionality out of the box, I've written the two functions below, choice
of which one depended on the generator's range:
random32to64 :: RandomGen g => g > (Word64, g)
random32to64 g0 = ((fromIntegral hi `unsafeShiftL` 32) .. fromIntegral lo, g2)
where
(hi, g1) = next g0
(lo, g2) = next g1
random64to64 :: RandomGen g => g > (Word64, g)
random64to64 g0 = (fromIntegral i64, g1)
where
(i64, g1) = next g0
Both tfrandom
and random
generate slightly less than 31 bits of entropy in one
iteration, instead of 32 or 64 as it is with others, so the simple approach from above
will not work, therefore I will just exclude tfrandom
and will keep random
's slow
implementation as the baseline for comparison:
It is clear now as night and day, that random
function is the culprit and should be
either fixed or avoided when possible.
In order to get a better picture of the libraries that do perform well I need to get rid
of the slow ones, therefore I am now forced to drop random
, ACRandom
, pcgen
,
splitmix
(only the 32bit version). Note, that there is a disclaimer in the splitmix
's
haddock not to use the SplitMix32
, which I included thus far for completeness only.
From a quick look at all the benchmarks above, the clear winner is
splitmix
. Now let's move towards stateful
RNGs and see how they compare with pure generators.
Libraries with stateful RNGs
Below is the list of packages that I was able to find, which provide RNGs that depend on a mutable state. Same as before, packages are in the order of appearance and the versions benchmarked are included as well:
Package  First appeared on  Last released on  Latest release 
mwcrandom 
20091226  20180711  mwsrandom == 0.14.0.0 
sfmt 
20141209  20150415  sfmt == 0.1.1 
pcgrandom 
20141215  20190518  pcgrandom == 0.1.3.6 
mersennerandom 
20080122  20110618  mersennerandom == 1.0.0.1 
In a very similar fashion as before I ran sequential and parallel generations of random arrays, except now with stateful generators.
The intriguing part of this chart for me was the fact that stateful implementations of
pcgrandom
and mersennerandom
did not parallelize well at all. My gut feeling
suggests that it has something to do with FFI
bindings and some peculiarities of
underlying C
implementations. Although sfmt
performed pretty well both sequentially
and in parallel, despite being pretty much all written in C
.
Finalists
In the end I would like to combine the finalists of this performance race from all of the three categories: pure, pure splittable and stateful. Packages are now sorted by the median of duration of their runtimes:
An extra bonus benchmarks for the libraries that could efficiently generate Double
.
I’ve tried
System.Random.random
function first and as I suspected it performs just as awfully for Double
s as for
integral types, therefore amongst pure generators I kept only the ones that provided
functions for Double
out of the box.
Conclusion
I started with a goal of finding a goto library for myself to use, whenever I would need random numbers. I certainly found what I was looking for and since all of us need a bit of randomness once in a while, I thought I'd share these findings with all of other Haskellers out there. Hope you find them useful.
In the process of researching and experimenting I made a few realizations for myself, thus would like to share those with you as well, despite that they might be a bit subjective:

splitmix
pleasantly surprised me with its performance. Great job, Oleg! Parallelization of the algorithm isn't that great, only a factor of x2 on a quad core CPU, but I suspect it might be because it is getting bounded by the speed of my memory, rather than the algorithm or the implementation. This probably means that it is already pretty close to the optimal performance, although, as noted in the haddock, a possibility of future addition of SIMD to GHC (andsplitmix
) could give it a slight boost. 
sfmt
turned out to be a gem I never knew about. Underneath it relies on implementation inC
and SIMD instructions, so it might not be extremely portable, but that doesn't mean it can't be useful for a whole variety of projects, especially since it turned out to be the fastest stateful generator. 
mwcrandom
has been around the block and is quite a popular library. It didn't turn out to be the fastest, but it still performs quite well and running in parallel improves the runtime significantly. Moreover, the interface is solid and provides ability to generate other distributions out of the box. 
random
is the package I was quite disappointed in. I was more than sure thatStdGen
will have poor performance, but I did not expectRandom
class instances to be so slow. I know that there was some effort in improving it, but as it turns out that effort was not only unfruitful, but it was also directed at the wrong problem. From the looks of it the true problem is not in the generator that is being used in the library, but in the way that all generators are being used. Of course, it doesn't mean theStdGen
is not a problem, it's just that problem today has an easy fix.
If it were up to me, here are the improvements I would make:
 Switch
random
package to depend onsplitmix
, instead of the other way round. Get rid of the internalStdGen
implementation in favor ofSMGen
(i.e.type StdGen = SMGen
).  Improve the implementation of each individual instance of
Random
 Make it possible for implementers of custom generators to override default
implementations of individual primitive types (i.e. customizable
random
function).  Add an interface to
random
package for stateful generators, which could be used bymwcrandom
and others alike.
Thank you. If you'd like to inspect the code and convince yourself of legitimacy of the discussed benchmarks by running them yourself or scrutinizing them some other way, they can be found in my haskellbenchmarks repo.