R: strange result using sample function in apply

Question

I am running a simulation that takes a sample of values in a matrix that starts with 1 column. I then put them through selection criteria and then from each row in the matrix it randomly selects a value from the output and saves that random selection. For some reason when I apply sample to the matrix on rows that have a real number and a NA, it returns a number that is not even available to be sampled. I may be doing something wrong with the sample function, but I don't understand where this unknown value is coming from.

Example code:

theta <- c(30, 84, 159, 32, 60, 97)
omega <- 0.01
k <- 1
xn <- matrix(c(30, 84, 159, 32, 60, 97), ncol=1)

dup <- xn * 2 

set.seed(1)
z <- matrix(rbinom(n=rep(1,length(dup)),size = as.vector(dup),prob = 0.5),nrow = nrow(dup))            
z1 <- dup - z           
xn <- cbind(z, z1) # put both in a matrix
W <- exp( -(1/2)*( ( ( xn - theta ) / theta ) ^2 / omega ) )         

set.seed(1) 
Z <- matrix(rbinom(nrow(W) * ncol(W), 1, W), nrow=nrow(W), ncol=ncol(W) ) 
xn <- ifelse ( Z == 0, 0, xn )

xn
     [,1] [,2]
[1,]   32    0
[2,]   78    0
[3,]  144    0
[4,]    0   30
[5,]   60   60
[6,]   92  102

I don't want to include any 0 values so I change them to NA and then apply the sample function to each row to return a single value.

xn[which(xn==0)] <- NA
set.seed(1)
xn2 <- matrix(apply(xn, 1, function(x){sample(x[!is.na(x)], size = k)}), ncol = k)

What I should get is

xn
     [,1]
[1,]   32 
[2,]   78 
[3,]  144 
[4,]   30
[5,]   60
[6,]  102

but what I get is:

xn
     [,1]
[1,]   9
[2,]   30
[3,]   83
[4,]   24
[5,]   60
[6,]  102

Specifically, in this example, the values 9, 23, 55, and 24 are coming out of nowhere that I know of.

Does anyone know what mistake I am making when I take this sample?

I think you just want to avoid using sample when there's only 1 item. This seems to give what you're after: matrix(apply(xn, 1, function(x){if (length(x[!is.na(x)]) > 1) { sample(x[!is.na(x)], size = k) } else x[!is.na(x)] }), ncol=k) — GSee
This behavior is clearly stated in the docs; ?sample: "If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x." — Jason Morgan
thanks, I just looked at sample and doing: set.seed(1) > sample(102, 1) [1] 28 Looking at the sample documentation it says: 'If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x' — Kevin

GSee GSee · Accepted Answer · 2012-06-29T00:14:03

To summarize the comments,

?sample says

If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.

For your application, when x is of length 1, you really just want to use the value of x instead of sample(x). You can adapt your code by adding a check to see if the length of x is greater than 1 before passing it through sample

matrix(apply(xn, 1, function(x){
  if (length(x[!is.na(x)]) > 1) { 
    sample(x[!is.na(x)], size = k) 
  } else x[!is.na(x)] 
}), ncol=k)
     [,1]
[1,]   32
[2,]   78
[3,]  144
[4,]   30
[5,]   60
[6,]  102

R: strange result using sample function in apply

1 Answers