40
votes

I am a bit confused about the c++11 random library.

What I understand: we need two separate concepts:

  • random engine (which can be pseudo (need seed) or real)
  • distribution: it maps the numbers obtained from the engine to a specific interval, using a specific distribution.

What I don't understand is why not just use this:

std::random_device rd;
std::uniform_int_distribution<int> dist(1, 5);

// get random numbers with:
dist(rd);

As far as I can tell this works well.

Instead, this is what I found on most examples/sites/articles:

std::random_device rd;
std::mt19937 e{rd()}; // or std::default_random_engine e{rd()};
std::uniform_int_distribution<int> dist{1, 5};

// get random numbers with:
dist(e);

I am not talking about special use, e.g. cryptography, just your basic getting started articles.

My suspicion is because std::mt19937 (or std::default_random_engine) accepts a seed, it can be easier to debug by providing the same seed during a debug session.

Also, why not just:

std::mt19937 e{std::random_device{}()};
3
"the performance of many implementations of random_device degrades sharply once the entropy pool is exhausted. For practical use random_device is generally only used to seed a PRNG such as mt19937" sourceMichael
@Michael That should be an answerFabio says Reinstate Monica
Well, an answer might want to go into more detail (e.g. why the performance degrades). Hence why I posted it as a comment.Michael
Regarding your last question; that is exactly how I frequently spin up a prng for many non-cryptographic-mandated purposes.WhozCraig
@Michael it might, but surely doesn't have to.rubenvb

3 Answers

41
votes

Also, why not just:

std::mt19937 e{std::random_device{}()};

It might be fine if you only will do this once, but if you will do it many times, it's better to keep track of your std::random_device and not create / destroy it unnecessarily.

It may be helpful to look at the libc++ source code for implementation of std::random_device, which is quite simple. It's just a thin wrapper over std::fopen("/dev/urandom"). So each time you create a std::random_device you are getting another filesystem handle, and pay all associated costs.

On windows, as I understand, std::random_device represents some call to a microsoft crypto API, so you are going to be initializing and destroying some crypto library interface everytime you do this.

It depends on your application, but for general purposes I wouldn't think of this overhead as always negligible. Sometimes it is, and then this is great.

I guess this ties back into your first question:

Instead, this is what I found on most examples/sites/articles:

 std::random_device rd;
 std::mt19937 e{rd()}; // or std::default_random_engine e{rd()};
 std::uniform_int_distribution<int> dist{1, 5};

At least the way I think about it is:

  • std::mt19937 is a very simple and reliable random generator. It is self-contained and will live entirely in your process, not calling out to the OS or anything else. The implementation is mandated by the standard, and at least in boost, it used the same code everywhere, derived from the original mt19937 paper. This code is very stable and it's cross-platform. You can be pretty confident that initializing it, querying from it, etc. is going to compile to similar code on any platform that you compile it on, and that you will get similar performance.

  • std::random_device by contrast is pretty opaque. You don't really know exactly what it is, what it's going to do, or how efficient it will be. You don't even know if it can actually be acquired -- it might throw an exception when you attempt to create it. You know that it doesn't require a seed. You're not usually supposed to pull tons and tons of data from it, just use it to generate seeds. Sometimes, it acts as a nice interface to cryptographic APIs, but it's not actually required to do that and sadly sometimes it doesn't. It might correspond to /dev/random on unix, it might correspond to /dev/urandom/. It might correspond to some MSVC crypto API (visual studio), or it might just be a fixed constant (mingw). If you cross-compile for some phone, who knows what it will do. (And even when you do get /dev/random, you still have the problem that performance may not be consistent -- it may appear to work great, until the entropy pool runs out, and then it runs slow as a dog.)

The way I think about it is, std::random_device is supposed to be like an improved version of seeding with time(NULL) -- that's a low bar, because time(NULL) is a pretty crappy seed all things considered. I usually use it where I would have used time(NULL) to generate a seed, back in the day. I don't really consider it all that useful outside of that.

11
votes

This article is a good point to start.

I'm going to synthesize just few points:

  • It Has Unknown Cost.

    How costly it is to read a number from this “device”? That is unspecified. It could, for example, be reading from /dev/random on a Linux system, which can block for an extended period waiting for entropy (which is itself problematic for a variety of reasons).

For my personal experience I've notified that std::random_device is usually slower than a simple Pseudo-randomic algorithm. That could be no true in general, but usually it does. That because it may involve physical devices, or other hardware than the simple CPU.

  • It Might Actually Be Deterministic .

    C++11's std::random_device is not required to be nondeterministic! Implementations can and do implement it as a simple RNG with a fixed seed, so it produces the same output for every run of the program.

3
votes

Why not just use random_device?

This question is actually a very good one.

The answer is - of course you can just use std::random_device exactly like you written in your example. It is totally legitimate and correct use of std::random_device - and any distribution may be used on top of it just like with any other random engine. If you don't need Pseudo-Random Number Generator (PRNG) like std::mt19937 or any other - just don't use it. That's it.

Mantra repeated by many people a la - "std::random_device is just for seeding blah-blah-blah" is a random BS (pun intended) that have nothing to do with the meaning and purpose of std::random_device. Sure though std::random_device can be used as a PRNG seed - just like any other source of random information.

Having said that - whether you actually should just use std::random_device instead of a good PRNG depends entirely on your application needs - some details written below.

You should view any PRNG as a mathematical function that takes limited size input sequence of bits and produces a very long output sequence of numbers with some (typically uniform) distribution. If you pass the same input bits to the same PRNG twice - you will get the same output sequence. Just like if you use the same value of x to compute std::sin(x) twice - you will get exactly the same sine value returned. That's why if you need to avoid repeating the same PRNG output number sequence each time - its input bits (seed) must be different each time. Obviously since PRNG operation requires nothing more than some computations - it is usually local and fast - no system calls, no external devices involved, no blocking while waiting for something, no exceptions thrown - instant result and high rate of numbers generated that scales easily when CPU performance increases.

std::random_device on the other hand is the first attempt to introduce actual random number generator in C++ standard library.

Quote from C++ standard (ISO/IEC 14882-2017):

29.6.6 Class random_device

  1. A random_device uniform random bit generator produces nondeterministic random numbers.
  2. If implementation limitations prevent generating nondeterministic random numbers, the implementation may employ a random number engine.

^^^ This quote is quite funny because (1) and (2) above entirely contradict each other. std::random_device either produces nondeterministic random numbers OR its implementation prevents it - both cannot possibly be true at the same time. But word "if" and "may" are present only in (2) - so the only possible non-contradictory understanding of the quote above is that "if" from (2) is never realized and every implementation just produces nondeterministic random numbers - i.e. complies with (1).

Let's just assume that standard compliant std::random_device simply produces uniformly distributed sequence of random and independent bits. If we are super optimistic we can even hope to get cryptographically secure random numbers - even though C++ standard doesn't guarantee or even promise anything like that. Good news is that modern implementations actually provide it - both typical /dev/urandom UNIX implementation and Win32 Crypto API implementation should be secure enough. Without being cryptographically secure std::random_device is not very useful tool anyway.

Especially since according to the C++ standard:

result_type operator()();

6 Returns: A nondeterministic random value, uniformly distributed between min() and max(), inclusive. It is implementation-defined how these values are generated.

^^^ thus if we really need to - we may somewhat limit the portability of the application to only those implementations that produce cryptographically secure output of std::random_device::operator()() - as this is well defined and documented individually for each particular implementation (that's what implementation-defined means BTW). Of course if we don't need some strict requirements like secure random numbers we shouldn't limit the portability.

Obviously nondeterministic sequence of uniformly distributed and independent random bits (AKA true random numbers) cannot be produced without external source of information (external source of randomness) - like some sensor signal noise or precisely measured times of some external events - anything external and inherently irregular. (By external I mean that information comes from outside media - but the sensor itself may be integrated to CPU/SoC). Any such source of external randomness then typically filtered to remove detectable regularities to assure compliance with a requirement of uniformly distributed and independent bit sequence output. All this greatly limits data rate produced and creates a possibility of failure and/or blocking while waiting for new external data.

So let's now weigh pros and cons of PRNG sequence VS true random numbers for different kinds of applications.

  1. If application requires random number generation for information security purposes - passwords, encryption keys, salt values, security tokens, etc. then there is no question - of C++ standard features only std::random_device and not any compliant one (let alone non-compliant ones) but only those that provide cryptographically secure implementations. Some PRNGs may be used for information security purposes too but only special class of secure PRNG and only if they are carefully seeded with secure random seed of enough size (enough entropy) - so you need true random numbers anyway to produce a seed. As of now - no PRNG engine from C++ standard library is cryptographically secure. If you don't trust std::random_device to be secure (for example - you don't want to limit portability to suitable implementations only or you want to avoid the effort of checking implementation suitability for each supported platform after each update) then only non-standard trusted third party solution can be used safely - e.g. Win32 crypto API directly or UNIX /dev/random or /dev/urandom directly or some other non-standard solution - whatever is trustworthy enough for you.

  2. Other sensitive applications where non-predictability of random numbers is very important - like online casinos, online betting, stock market trading, etc. - also probably require cryptographically secure random numbers - so all the considerations from (1) apply here too.

  3. For most other applications - like usual games or scientific simulations or anything where money or security not involved and thus potential predictability of random number sequence cannot harm - typical good quality PRNG is sufficient. While it may be fine to still use std::random_device for many of these applications but only if performance (rate and delay of generation) is not important. In many cases performance is actually of major importance - e.g. for scientific simulations or real-time noise simulation (for computer graphics or sound effects, etc.) - so sometimes true random numbers are unsuitable for performance reasons.

  4. Also there are applications where PRNG is fundamentally necessary - e.g. some games may generate maps/worlds/levels on the fly using PRNG with a fixed seed value to avoid storing them to save disk space (this was a popular trick on early computers with very little RAM and storage space but still used in some modern games too). Another example is noise substitution phase of audio/video compression algorithms - where actual background noise is replaced with PRNG-generated pseudo-noise of the same magnitude and spectral characteristics as the original noise to store just a seed to a compressed bitstream instead of lots of actual uncompressible random information.

One last note:

If you don't need secure random numbers and don't want to rely on quality or even standard compliance of std::random_device implementation then using it alone to generate PRNG seed is also a bad idea. You should put more randomness into the mix - e.g. combine std::random_device output with current time of maximum available precision (microsecond/nanosecond/whatever is available) and if available add some other external sensors' readings too (e.g. raw gyro sensor readings or audio mic readings or raw image sensor readings - anything external and noisy).

For example. Instead of using this:

std::mt19937::result_type seed = std::random_device()();

std::mt19937 gen(seed);

Better use something like this:

std::mt19937::result_type seed = std::random_device()()
        ^ std::chrono::duration_cast<std::chrono::seconds>(
            std::chrono::system_clock::now().time_since_epoch()
            ).count()
        ^ std::chrono::duration_cast<std::chrono::microseconds>(
            std::chrono::high_resolution_clock::now().time_since_epoch()
            ).count()
        /* ^ more_external_random_stuff */ ;

std::mt19937 gen(seed);

You may also initialize full state seed sequence of std::mt19937::state_size (=624) 32-bit numbers:

std::random_device rd;
std::array< std::uint32_t, std::mt19937::state_size >  seed_array;

for( auto it = seed_array.begin(); it != seed_array.end(); ++it )
{
    // read from std::random_device
    *it = rd();

    // mix with a C++ equivalent of time(NULL) - UNIX time in seconds
    *it ^= std::chrono::duration_cast<std::chrono::seconds>(
                    std::chrono::system_clock::now().time_since_epoch()
                    ).count();

    // mix with a high precision time in microseconds
    *it ^= std::chrono::duration_cast<std::chrono::microseconds>(
            std::chrono::high_resolution_clock::now().time_since_epoch()
            ).count();

    //*it ^= more_external_random_stuff;
}

std::seed_seq sseq( seed_array.cbegin(), seed_array.cend() );
std::mt19937 gen(sseq);