9
votes

I am trying to translate some python code to C++. What the code does is to run a monte carlo simulation. I thought the results from Python and C++ could be very close, but seems something funny happened.

Here is what I do in Python:

self.__length = 100
self.__monte_carlo_array=np.random.uniform(0.0, 1.0, self.__length)

Here is what I do in C++:

int length = 100;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_real_distribution<double> distribution(0, 1);

for(int i = 0; i < length; i++)
{
    double d = distribution(mt);
    monte_carlo_array[i] = d;
}

I ran above random number generation 100x5 times both in Python and C++, and then do monte carlo simulation with these random numbers.

In monte carlo simulation, I set the threshold as 0.5, thus I can easily verify if the results are uniform distributed.

Here is a conceptual draft what monte carlo simulation does:

for(i = 0; i < length; i++)
{
    if(monte_carlo_array[i] > threshold)    // threshold = 0.5
        monte_carlo_output[i] = 1;
    else
        monte_carlo_output[i] = 0;
}

Since the length of the monte carlo array is 120, I expect to see 60 1s both in Python and C++. I calculate the average number of 1s and found that, although the average number in C++ and Python is around 60, but the trend are highly correlated. Moreover, the average number in Python is always higher than in C++.

distribution chart May I know if this is because I've done something wrong, or it is simply because the difference between random generation mechanisms in C++ and Python?

[edit] Please note that the RNG in Python is also the Mersenne Twister 19937.

2
Different random number generators give different sets of random numbers. I would expect that if you ran it a few more times (like hundreds of times), you'd get a less obvious difference.Mats Petersson
Is this really what you see with the code you show? There must be other inputs, otherwise there wouldn't be any correlation between the codes at all! I suspect the bug is elsewhere...Andrew Jaffe
Those results seams manipulated...LS_ᴅᴇᴠ
There is not enough data to conclude anything. You have just 5 runs, all results lie well within the one standard deviation interval ([52,68] assuming Poissonian uncertainties). Yes, there are obviously differences between the numbers. But you initialize both your RNGs with a random seed, what did you expect? Also, what exactly does the plot show? If you just count the ones, how do you end up withnumbers like 59.75 (Python, run 4)?Carsten
@Carsten: He's actually generating 120 Bernoulli's with p=0.5, so the sum has a binomial(120,.5) distribution rather than Poisson. That makes the standard deviation sqrt(30) or about 5.48. Your larger points are certainly correct - all of the results fall well within one standard deviation of the mean, and 5 is nowhere near a large enough number of replications to say anything meaningful.pjs

2 Answers

5
votes

I wrote this based on the code posted:

import numpy as np

length = 1000
monte_carlo_array=np.random.uniform(0.0, 1.0, length)
# print monte_carlo_array
threshold = 0.5
above = 0

for i in range (0,length):
    if monte_carlo_array[i] > threshold:
        above+=1
print above

and this in C++:

#include <random> 
#include <iostream>

int main()
{
    const int length = 1000;
    std::random_device rd;
    std::mt19937_64 mt(rd());
    std::uniform_real_distribution<double> distribution(0, 1);
    double threshold = 0.5;
    double monte_carlo_array[length];

    for(int i = 0; i < length; i++)
    {
        double d = distribution(mt);
        monte_carlo_array[i] = d;
    }
    int above = 0;

    for(int i = 0; i < length; i++)
    {
        if (monte_carlo_array[i] > threshold)
        {
            above++;
        }
    }
    std::cout << above << std::endl;
}

Five runs each gives:

Python:
480
507
485
515
506
average:
498.6

C++:
499
484
531
509
509
average
506.4

So if anything I'm finding that C++ is higher than python. But I think it's more a case of "random numbers are not uniformly distributed with a small number of samples."

I changed length to 100000 instead, and still the results vary up and down around 50k:

Python:

50235
49752
50215
49717
49974

Average: 
49978.6

C++:

50085
50018
49993
49779
49966

Average:
49968.2

In summary, I don't think it's any huge difference between the random number implementations in C++ and Python when it comes to "how uniform it is around 0.5". But I've not studied statistics very much (and it was many years ago).

0
votes

When you are not sure in random numbers just generate huge amount of random numbers by using service like Random ORG. After that supply this numbers as array in both implementations (C++ and Python). By this way you will be sure that both programs are using the same set of random numbers and you will be able to confirm that the rest of the code is OK.