Numpy: random seed and multithreading causes differing results

Question

Tested on python 3.7, numpy 1.17.3:

it seems, that the random number generation with numpy when using a fixed seed and multithreading is not providing consistent results. This issue does not come up with scipy. The following snippet shows the problem:

import numpy as np
from scipy.stats import nbinom 

from concurrent.futures import ThreadPoolExecutor, as_completed


def load_data_np():
    np.random.seed(0)
    return np.random.negative_binomial(5, 0.3, size=2)
def load_data_scipy():
    return nbinom.rvs(5, 0.3, size=2, random_state=0)

These two methods should thus produce always the same numbers. But when producing the data in threaded loop...

with ThreadPoolExecutor() as executor:
   futures = list(
       (executor.submit(load_data_np)
        for i in range(1000))
   )
   print(np.diff([future.result() for future in as_completed(futures)]))

on can find such values among the output of numpy:

...
 [  4]
 [ -3]
 [-15]
 [ -3]
 [  5]
 [ -6]
 [  0]
 [  6]
 [  1]
 [-13]
 [ -7]
 [  3]
 [  6]
 [ -2]
 [ -1]
 [-11]
 [  3]
...

This must mean, that inbetween subsequent computations for the 2 samples (size=2) the random seed must have been reset by another thread, which throws the other threads off in their rng count. Just to compare this to scipy:

with ThreadPoolExecutor(max_workers=cpu_count()) as executor:
    futures = list(
        (executor.submit(load_data_scipy)
         for i in range(1000))
    )
    print(np.diff([future.result() for future in as_completed(futures)]))

yields the same values every iteration

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

So what is the proper way of thread-safe rng with a fixed seed in numpy? Googling the issue has lead me back to np.random.seed.

Cheers, Michael

fzn fzn · Accepted Answer · 2019-11-15T11:59:07

I modified your load_data_np method to not use np.random.seed.

As I found in some other SO thread seed is known to not be thread-safe, and its recommended to use your own instances of RandomState.

def load_data_np():
    rs = np.random.RandomState(0)
    return rs.negative_binomial(5, 0.3, size=2)

And the output now looks as expected

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

This should help.

Numpy: random seed and multithreading causes differing results

1 Answers