What is the most secure seed for random number generation?

Question

What are the most secure sources of entropy to seed a random number generator? This question is language and platform independent and applies to any machine on a network. Ideally I'm looking for sources available to a machine in a cloud environment or server provided by a hosting company.

There are two important weaknesses to keep in mind. The use of time for sending a random number generator is a violation of CWE-337. The use of a small seed space would be a violation of CWE-339.

If you attach a geiger counter to your computer and pull the information from there, that would probably be the most random that is "commonly" available. — James Black
@Neil Butterworth Security in the sense of random number generation is always going to be a value that is difficult for the attacker to guess. — rook
@The Rock - So the server isn't generating the seed? Without more details this is a hard question to answer, as any passing of a seed will involve encrypting it with more than https, and then you have problems with key management. This is especially true for cloud computing, as you have to assume the sysadmins are not to be trusted. You may want to read "Programming Satan's Computer" to understand more about his: lambda-the-ultimate.org/node/3482 — James Black

Thomas Pornin Thomas Pornin · Accepted Answer · 2010-08-20T15:04:02

Here are a few thoughts. If you are impatient, skip to the conclusion, at the end.

1. What is a secure seed ?

Security is defined only relatively to an attack model. We want here a sequence of n bits, that has n bits of entropy with regards to the attacker: in plain words, that any of the possible 2ⁿ values for that sequence are equally probable from the attacker point of view.

This is a model which relates to the information available to the attacker. The application which generates and uses the seed (normally in a PRNG) knows the exact seed; whether the seed is "secure" is not an absolute property of the seed or even of the seed generation process. What matters is the amount of information that the attacker has about the generation process. This level of information varies widely depending on the situation; e.g. on a multi-user system (say Unix-like, with hardware-enforced separation of applications), precise timing of memory accesses can reveal information on how a nominally protected process reads memory. Even a remote attacker can obtain such information; this has been demonstrated (in lab conditions) on AES encryption (typical AES implementations use internal tables, with access patterns which depend on the key; the attacker forces cache misses and detects them through precise timing of responses of the server).

The seed lifetime must be taken into account. The seed is secure as long as it remains unknown to the attacker; this property must hold true afterwards. In particular, it shall not be possible to recover the seed from excerpts of the subsequent PRNG output. Ideally, even obtaining the complete PRNG state at some point should offer no clue as to whatever bits the PRNG produced beforehand.

The point I want to make here is that a seed is "secure" only if it is used in a context where it can remain secure, which more or less implies a cryptographically secure PRNG and some tamper-resistant storage. If such storage is available, then the most secure seed is the one that was generated once, a long time ago, and used in a secure PRNG hosted by tamper-resistant hardware.

Unfortunately, such hardware is expensive (it is called a HSM and costs a few hundreds or thousands of dollars), and that cost usually proves difficult to justify (a bad seed will not prevent a system from operating; this is the usual problem of untestability of security). Hence it is customary to go for "mostly software" solutions. Since software is not good at providing long-term confidential storage, the seed lifetime is arbitrarily shortened: a new seed is periodically obtained. In Fortuna, such reseeding is supposed to happen at least once every megabyte of generated pseudo-random data.

To sum up, in a setup without a HSM, a secure seed is one that can be obtained relatively readily (since we will do it quite often) using data that cannot be gathered by the attacker.

2. Mixing

Random data sources do not produce nice uniform bits (each bit having value 1 with probability exactly 0.5, and bit values are independent of each other). Instead, random sources produce values in a source-specific sets. These values can be encoded as sequences of bits, but you do not get your money worth: to have n bits of entropy you must have values which, when encoded, uses much more than n bits.

The cryptographic tool to use here is a PRF which accepts an input of arbitrary length, and produces an n-bit output. A cryptographically secure PRF of that kind is modeled as a random oracle: in short terms, it is not computationally feasible to predict anything about the oracle output on a given input without trying it.

Right now, we have hash functions. Hash functions must fulfill a few security properties, namely resistance to preimages, second preimages, and collisions. We usually analyze hash functions by trying to see how they depart from the random oracle model. There is an important point here: a PRF which follows the random oracle model will be a good hash function, but there can be good hash functions (in the sense of resistance to preimages and collisions) which nonetheless are easy to distinguish from a random oracle. In particular, the SHA-2 functions (SHA-256, SHA-512...) are considered to be secure, but depart from the random oracle model due to the "length extension attack" (given h(m), it is possible to compute h(m || m') for a partially constrained message m' without knowing m). The length extension attack does not seem to provide any shortcut into the creation of preimages or collisions, but it shows that those hash functions are not random oracles. For the SHA-3 competition, NIST stated that candidates should not allow such "length extension".

Hence, the mixing step is not easy. Your best bet is still, right now, to use SHA-256 or SHA-512, and switch to SHA-3 when it is chosen (this should happen around mid-2012).

3. Sources

A computer is a deterministic machine. To get some randomness, you have to mix in the result of some measures of the physical world.

A philosophical note: at some point you have to trust some smart guys, the kind who may wear lab coats or get paid to do fundamental research. When you use a hash function such as SHA-256, you are actually trusting a bunch of cryptographers when they tell you: we looked for flaws, real hard, and for several years, and found none. When you use a decaying bit of radioactive matter with a Geiger counter, you are trusting some physicists who say: we looked real hard for ways to predict when the next atom kernel will go off, but we found none. Note that, in that specific case, the "physicists" include people like Becquerel, Rutherford, Bohr or Einstein, and "real hard" means "more than a century of accumulated research", so you are not exactly in untrodden territory here. Yet there is still a bit of faith in security.

Some computers already include hardware which generates random data (i.e. which uses and measures a physical process which, as far as physicist can tell, is random enough). The VIA C3 (a line of x86-compatible CPU) have such hardware. Strangely enough, the Commodore 64, home computer from 30 years ago, also had a hardware RNG (or so says Wikipedia, at least).

Barring special hardware, you have to use whatever physical events you may get. Typically, you would use keystrokes, incoming ethernet packets, mouse movements, harddisk accesses... every event comes with some data, and occurs at a measurable instant (modern processors have very accurate clocks, thanks to cycle counters). Those instants, and the event data contents, can be accumulated as entropy sources. This is much easier for the operating system itself (which has direct access to the hardware) than for applications, so the normal way of collecting a seed is to ask the operating system (on Linux, this is called /dev/random or /dev/urandom [both have advantages and problems, choose your poison]; on Windows, call CryptGenRandom()).

An extreme case is pre-1.2 Java applets, before the addition of java.security.SecureRandom; since Java is very effective at isolating the application code from the hardware, obtaining a random seed was a tough challenge. The usual solution was to have two or three threads running concurrently and thread-switching madly, so that the number of thread switches per second was somewhat random (in effect, this tries to extract randomness through the timing of the OS scheduler actions, which depend on what also occurs on the machine, including hardware-related events). This was quite unsatisfactory.

A problem with time-related measures is that the attacker also knows what is the current time. If the attacker has applicative access to the machine, then he can read the cycle counter as well.

Some people have proposed using audio cards as sources of "white noise" by setting the amplifier to its max (even servers have audio nowadays). Others argue for powering up webcams (we know that webcam videos are "noisy" and that's good for randomness, even if the webcam is facing a wall); but servers with webcams are not common. You can also ping an external network server (e.g. www.google.com) and see how much time it takes to come back (but this could be observed by an attacker spying on the network).

The beauty of the mixing step, with a hash function, is that entropy can only accumulate; there is no harm in adding data, even if that data is not that random. Just stuff as much as possible through the hash function. Hash functions are quite fast (a good SHA-512 implementation will process more than 150 MB/s on a typical PC, using a single core) and seeding does not happen that often.

4. Conclusion

Use a HSM. They cost a few hundred or thousands of dollars, but aren't your secrets worth much more than that ? A HSM includes RNG hardware, runs the PRNG algorithm, and stores the seed with tamper resistance. Also, most HSM are already certified with regards to various national regulations (e.g. FIPS 140 in the US, and the EAL levels in Europe).

If you are so cheap that you will not buy a HSM, or if you want to protect data which is actually not very worthwhile, then build up a cryptographically secure PRNG using a seed obtained by hashing lots of physical measures. Anything which comes from some hardware should be hashed, along with the instant (read "cycle counter") at which that data was obtained. You should hash data by the megabyte here. Or, better yet, do not do it: simply use the facilities offered by your operating system, which already includes such code.

What is the most secure seed for random number generation?

20 Answers

Theory of operation