7
votes

I'm trying to replicate in R a bit of code someone else wrote in Stata, and have hit a wall trying to predict the behavior of their p-RNG.

Their code has this snippet:

set seed 123456

Unfortunately, it's a bit nebulous exactly the algorithm used by Stata. This question suggests it's a KISS algorithm, but didn't manage to replicate in the end (and some of the links there seem to be dead/outdated). And the manual from Stata for set seed doesn't mention anything about algorithms. This question as well doesn't seem to have been completed.

Is it a fool's errand to try and replicate Stata's random numbers?

I don't know which version of Stata was used to create this.

1
If you don't know which version was used, your problem is indeed more difficult as you want to replicate a program but you can't be precise on which program. blog.stata.com/2016/03/10/… gives an overview and underlines that the default method has changed in Stata 14. stata.com/manuals14/fn.pdf says more.Nick Cox
One question you mention stackoverflow.com/questions/35139808/… was not tagged "Stata" and did not include Stata code. It just mentioned Stata in passing. So, it's not surprising that it received no response in terms of Stata.Nick Cox

1 Answers

8
votes

In short: Yes, it is a fool's errand.

Stata, being a proprietary software, hasn't released all of the details of its core components, like its random number generator. However, documentation is available (link for Stata 14), most pertinently:

runiform() is the basis for all the other random-number functions because all the other random- number functions transform uniform (0, 1) random numbers to the specified distribution.

runiform() implements the Mersenne Twister 64-bit (MT64) and the “keep it simple stupid” 32-bit (KISS32) algorithms for generating uniform (0, 1) random numbers. runiform() uses the MT64 algorithm by default.

runiform() uses the KISS32 algorithm only when the user version is less than 14 or when the random-number generator has been set to kiss32...

Recall also from ?Random in R that for Mersenne twister:

The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

Stata internally controls the 624-dimensional set, which should be nearly impossible to guess.

I suggest you export these random numbers from Stata and read them into a vector/matrix/etc. in R using

library(haven)
mydata <- read_dta("mydata.dta")