Difference between C++ random number generation and Python

Question

I am trying to translate some python code to C++. What the code does is to run a monte carlo simulation. I thought the results from Python and C++ could be very close, but seems something funny happened.

Here is what I do in Python:

self.__length = 100
self.__monte_carlo_array=np.random.uniform(0.0, 1.0, self.__length)

Here is what I do in C++:

int length = 100;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_real_distribution<double> distribution(0, 1);

for(int i = 0; i < length; i++)
{
    double d = distribution(mt);
    monte_carlo_array[i] = d;
}

I ran above random number generation 100x5 times both in Python and C++, and then do monte carlo simulation with these random numbers.

In monte carlo simulation, I set the threshold as 0.5, thus I can easily verify if the results are uniform distributed.

Here is a conceptual draft what monte carlo simulation does:

for(i = 0; i < length; i++)
{
    if(monte_carlo_array[i] > threshold)    // threshold = 0.5
        monte_carlo_output[i] = 1;
    else
        monte_carlo_output[i] = 0;
}

Since the length of the monte carlo array is 120, I expect to see 60 1s both in Python and C++. I calculate the average number of 1s and found that, although the average number in C++ and Python is around 60, but the trend are highly correlated. Moreover, the average number in Python is always higher than in C++.

distribution chart May I know if this is because I've done something wrong, or it is simply because the difference between random generation mechanisms in C++ and Python?

[edit] Please note that the RNG in Python is also the Mersenne Twister 19937.

Different random number generators give different sets of random numbers. I would expect that if you ran it a few more times (like hundreds of times), you'd get a less obvious difference. — Mats Petersson
Is this really what you see with the code you show? There must be other inputs, otherwise there wouldn't be any correlation between the codes at all! I suspect the bug is elsewhere... — Andrew Jaffe
There is not enough data to conclude anything. You have just 5 runs, all results lie well within the one standard deviation interval ([52,68] assuming Poissonian uncertainties). Yes, there are obviously differences between the numbers. But you initialize both your RNGs with a random seed, what did you expect? Also, what exactly does the plot show? If you just count the ones, how do you end up withnumbers like 59.75 (Python, run 4)? — Carsten
@Carsten: He's actually generating 120 Bernoulli's with p=0.5, so the sum has a binomial(120,.5) distribution rather than Poisson. That makes the standard deviation sqrt(30) or about 5.48. Your larger points are certainly correct - all of the results fall well within one standard deviation of the mean, and 5 is nowhere near a large enough number of replications to say anything meaningful. — pjs

Mats Petersson Mats Petersson · Accepted Answer · 2013-08-12T11:15:51

I wrote this based on the code posted:

import numpy as np

length = 1000
monte_carlo_array=np.random.uniform(0.0, 1.0, length)
# print monte_carlo_array
threshold = 0.5
above = 0

for i in range (0,length):
    if monte_carlo_array[i] > threshold:
        above+=1
print above

and this in C++:

#include <random> 
#include <iostream>

int main()
{
    const int length = 1000;
    std::random_device rd;
    std::mt19937_64 mt(rd());
    std::uniform_real_distribution<double> distribution(0, 1);
    double threshold = 0.5;
    double monte_carlo_array[length];

    for(int i = 0; i < length; i++)
    {
        double d = distribution(mt);
        monte_carlo_array[i] = d;
    }
    int above = 0;

    for(int i = 0; i < length; i++)
    {
        if (monte_carlo_array[i] > threshold)
        {
            above++;
        }
    }
    std::cout << above << std::endl;
}

Five runs each gives:

Python:
480
507
485
515
506
average:
498.6

C++:
499
484
531
509
509
average
506.4

So if anything I'm finding that C++ is higher than python. But I think it's more a case of "random numbers are not uniformly distributed with a small number of samples."

I changed length to 100000 instead, and still the results vary up and down around 50k:

Python:

50235
49752
50215
49717
49974

Average: 
49978.6

C++:

50085
50018
49993
49779
49966

Average:
49968.2

In summary, I don't think it's any huge difference between the random number implementations in C++ and Python when it comes to "how uniform it is around 0.5". But I've not studied statistics very much (and it was many years ago).

Difference between C++ random number generation and Python

2 Answers