Erratic seed behavior with rbinom(prob=0.5) in R

Question

I have found what I would consider erratic behavior (but for which I hope there is a simple explanation) in R's use of seeds in conjunction with rbinom() when prob=0.5 is used. General idea: To me, if I set the seed, run rbinom() once (i.e. conduct a single random process), despite what value prob is set to, the random seed should change by one increment. Then, if I again set the seed to the same value, and run another random process (such as rbinom() again, but maybe with a different value of prob), the seed should again change to the same value as it did for the previous single random process.

I have found R does exactly this as long as I'm using rbinom() with any prob!=0.5. Here is an example:

Compare seed vector, .Random.seed, for two probabilities other than 0.5:

set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.4)
temp1 <- .Random.seed

set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.3)
temp2 <- .Random.seed

any(temp1!=temp2)
> [1] FALSE

Compare seed vector, .Random.seed, for prob=0.5 vs. prob!=0.5:

set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.5)
temp1 <- .Random.seed

set.seed(234908)
x <- rbinom(n=1,size=60,prob=0.3)
temp2 <- .Random.seed
any(temp1!=temp2)
> [1] TRUE

temp1==temp2
> [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
> [8]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
...

I have found this for all comparisions of prob=0.5 against all other probabilities in the set {0.1, 0.2, ..., 0.9}. Similarly, if I compare any values of prob from {0.1, 0.2, ..., 0.9} other than 0.5, the .Random.seed vector is always element-by-element equal. These facts also hold true for either odd or even size within rbinom().

To make it even more strange (I apologize that this is a little convoluted - it's relevant to the way my function is written), when I use probabilities saved as elements in a vector, I have same problem if 0.5 is first element, but not second. Here is the example for this case:

First case: 0.5 is the first probability referenced in the vector

set.seed(234908)
MNAR <- c(0.5,0.3)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp1 <- .Random.seed

set.seed(234908)
MNAR <- c(0.1,0.3)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp2 <- .Random.seed

any(temp1!=temp2)
> [1] TRUE

any(temp1!=temp2)
> [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
> [8]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Second case: 0.5 is the second probability referenced in the vector

set.seed(234908)
MNAR <- c(0.3,0.5)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp1 <- .Random.seed

set.seed(234908)
MNAR <- c(0.1,0.3)
x <- rbinom(n=1,size=60,prob=MNAR[1])
y <- rbinom(n=1,size=50,prob=MNAR[2])
temp2 <- .Random.seed

any(temp1!=temp2)
> [1] FALSE

Again, I find that despite the values used for prob and size, this pattern holds. Can anyone explain this mystery to me? It's causing quite a problem because results that should be the same are coming up different because the seed is for some reason used/calculated differently when prob=0.5 but in no other instance.

Perhaps a more concise example of this behavior would be set.seed(123);rbinom(1,60,0.5);rbinom(1,60,0.3); set.seed(123);rbinom(1,60,0.2);rbinom(1,60,0.3); set.seed(123);rbinom(1,60,0.4);rbinom(1,60,0.3) ? — joran
I would go to svn.r-project.org/R/trunk/src/nmath/rbinom.c , search for unif_rand(), and follow the logic through ... — Ben Bolker
An interesting result using @joran's example above is that you get back on your feet if you make prob = 0.2 or prob = 0.4 draw two numbers instead of one. It suggests that prob = 0.5 requires drawing twice as many random numbers than the other probs. That theory also checks out by replacing 60 with 120 in the OP's x <- rbinom(n=1,size=60,prob=0.3) case. — flodel
HA! Look at the code Ben pointed to: there are sections for n*p >= 30 and n*p < 30. The former uses two calls to unif_rand(), the latter a single one. Now notice that your example used prob = 0.5 and size = 60, i.e. n*p == 30! Test with size = 59 and the behavior disappears! — flodel

flodel flodel · Accepted Answer · 2013-09-20T04:08:49

So let's turn our comments into an answer. Thanks to Ben Bolker for putting us on the right track with a link to the code: https://svn.r-project.org/R/trunk/src/nmath/rbinom.c and the suggestion to track down where unif_rand() is called.

A quick scan and it seems that the code is broken into two sections, delimited by the comments:

/*-------------------------- np = n*p >= 30 : ------------------- */

and

/*---------------------- np = n*p < 30 : ------------------------- */

Inside each of these, the number of calls to unif_rand is not the same (two versus one.)

So for a given size (n), your random seed may end up in a different state depending on the value of prob (p): whether size * prob >= 30 or not.

With that in mind, all the results you got with your examples should now make sense:

# these end up in the same state
rbinom(n=1,size=60,prob=0.4) # => np <  30
rbinom(n=1,size=60,prob=0.3) # => np <  30

# these don't
rbinom(n=1,size=60,prob=0.5) # => np >= 30
rbinom(n=1,size=60,prob=0.3) # => np <  30

# these don't
{rbinom(n=1,size=60,prob=0.5)  # np >= 30
 rbinom(n=1,size=50,prob=0.3)} # np <  30
{rbinom(n=1,size=60,prob=0.1)  # np <  30
 rbinom(n=1,size=50,prob=0.3)} # np <  30

# these do
{rbinom(n=1,size=60,prob=0.3)  # np <  30
 rbinom(n=1,size=50,prob=0.5)} # np <  30
{rbinom(n=1,size=60,prob=0.1)  # np <  30
 rbinom(n=1,size=50,prob=0.3)} # np <  30

Erratic seed behavior with rbinom(prob=0.5) in R

2 Answers