I have found what I would consider erratic behavior (but for which I hope there is a simple explanation) in R
's use of seeds in conjunction with rbinom()
when prob=0.5
is used. General idea: To me, if I set the seed, run rbinom()
once (i.e. conduct a single random process), despite what value prob
is set to, the random
seed should change by one increment. Then, if I again set the seed to the same value, and run another random process (such as rbinom()
again, but maybe with a different value of prob
), the seed should again change to the same value as it did for the previous single random process.
I have found R
does exactly this as long as I'm using rbinom()
with any prob!=0.5
. Here is an example:
Compare seed vector, .Random.seed
, for two probabilities other than 0.5:
set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.4)
temp1 <- .Random.seed
set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.3)
temp2 <- .Random.seed
any(temp1!=temp2)
> [1] FALSE
Compare seed vector, .Random.seed
, for prob=0.5 vs. prob!=0.5:
set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.5)
temp1 <- .Random.seed
set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.3)
temp2 <- .Random.seed
any(temp1!=temp2)
> [1] TRUE
temp1==temp2
> [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE
> [8] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
...
I have found this for all comparisions of prob=0.5
against all other probabilities
in the set {0.1, 0.2, ..., 0.9}. Similarly, if I compare any values of prob
from
{0.1, 0.2, ..., 0.9} other than 0.5, the .Random.seed
vector is always element-by-element equal. These facts also hold true for either odd or even size
within rbinom()
.
To make it even more strange (I apologize that this is a little convoluted - it's relevant to the way my function is written), when I use probabilities saved as elements in a vector, I have same problem if 0.5 is first element, but not second. Here is the example for this case:
First case: 0.5 is the first probability referenced in the vector
set.seed(234908)
MNAR <- c(0.5,0.3)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp1 <- .Random.seed
set.seed(234908)
MNAR <- c(0.1,0.3)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp2 <- .Random.seed
any(temp1!=temp2)
> [1] TRUE
any(temp1!=temp2)
> [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE
> [8] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Second case: 0.5 is the second probability referenced in the vector
set.seed(234908)
MNAR <- c(0.3,0.5)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp1 <- .Random.seed
set.seed(234908)
MNAR <- c(0.1,0.3)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp2 <- .Random.seed
any(temp1!=temp2)
> [1] FALSE
Again, I find that despite the values used for prob
and size
, this pattern holds. Can anyone explain this mystery to me? It's causing quite a problem because results that should be the same are coming up different because the seed is for some reason used/calculated differently when prob=0.5
but in no other instance.
set.seed(123);rbinom(1,60,0.5);rbinom(1,60,0.3); set.seed(123);rbinom(1,60,0.2);rbinom(1,60,0.3); set.seed(123);rbinom(1,60,0.4);rbinom(1,60,0.3)
? – joranunif_rand()
, and follow the logic through ... – Ben Bolkerprob = 0.2
orprob = 0.4
draw two numbers instead of one. It suggests thatprob = 0.5
requires drawing twice as many random numbers than the other probs. That theory also checks out by replacing60
with120
in the OP'sx <- rbinom(n=1,size=60,prob=0.3)
case. – flodeln*p >= 30
andn*p < 30
. The former uses two calls tounif_rand()
, the latter a single one. Now notice that your example usedprob = 0.5
andsize = 60
, i.e.n*p == 30
! Test withsize = 59
and the behavior disappears! – flodel