I have a simple question regarding the sample function in R. I'm randomly sampling from 0s and 1s and summing them together, from an input vector of length 5, which designates the number of trials to run and sets the seed to generate reproducible random numbers. Seed works as expected, but I get different matrices of random numbers depending on what I put in the prob statement. In this case I assumed prob=NULL should be the same as prob=c(0.5,0.5). Why isn't it?
vn<-c(12, 44, 9, 17, 28)
> do.call(cbind, lapply(c(1:10),function(X) {set.seed(X); sapply(vn, function(Y) sum(sample(x=c(0,1),size=Y,replace=T)), simplify=TRUE)}))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 6 7 7 6 6 9 3 6 2 5
[2,] 22 21 20 29 22 24 24 19 25 19
[3,] 4 8 3 5 4 4 4 6 4 2
[4,] 8 4 12 9 11 7 9 10 8 8
[5,] 13 9 11 14 12 14 10 13 11 12
> do.call(cbind, lapply(c(1:10),function(X) {set.seed(X); sapply(vn, function(Y) sum(sample(x=c(0,1),size=Y,replace=T, prob=c(0.5,0.5))), simplify=TRUE)}))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 6 5 5 6 6 3 9 6 10 7
[2,] 22 23 24 15 22 20 20 25 19 25
[3,] 5 1 6 4 5 5 5 3 5 7
[4,] 9 13 5 8 6 10 8 7 9 9
[5,] 15 19 17 14 16 14 18 15 17 16
UPDATE:
I extended the samplings to 100, with an input vector
vn<-seq(0,100,5)
and compared the rowMeans of the output matrices without prob (test1) and with prob=c(0.5,0.5) against expected mean. Interestingly, test1 and test2 are off by the exact same amount by reversed signs. Why is that? Thanks!
> rowMeans(test1)-seq(0,100,5)/2
[1] 0.00 -0.07 -0.01 -0.35 -0.07 0.19 -0.07 0.24 0.21 0.46 0.20 0.50 -0.37 -0.35 0.00 0.64 -0.59 0.63 -1.19 0.44 -0.38
> rowMeans(test2)-seq(0,100,5)/2
[1] 0.00 0.07 0.01 0.35 0.07 -0.19 0.07 -0.24 -0.21 -0.46 -0.20 -0.50 0.37 0.35 0.00 -0.64 0.59 -0.63 1.19 -0.44 0.38
sample
uses different c routines for uniform sampling and weighted sampling. Though you are using equal weights, R will call the weighted sampling anyway. – Randy Laiprob=c(0.5, 0.49999)
– Fernando