1
votes

Given a 5x5 matrix:

dataset=matrix(cbind(c(1,1,2,2,0),
                     c(1,1,2,0,0),
                     c(0,0,0,1,0),
                     c(0,0,1,1,1),
                     c(1,2,3,4,0))
dataset
      [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    0    1
[2,]    1    1    0    0    2
[3,]    2    2    0    1    3
[4,]    2    0    1    1    4
[5,]    0    0    0    1    0

I want to sample 1 observation from each row of the matrix where the value to be sampled from the row is equal to 1 and I want to create a new matrix that the randomly sampled value is set to True in the new matrix and all other values are set to false. A sample of the expected output is provided below:

       1     2     3     4     5  
1   FALSE  TRUE FALSE FALSE FALSE 
2    TRUE FALSE FALSE FALSE FALSE
3   FALSE FALSE FALSE  TRUE FALSE 
4   FALSE FALSE  TRUE FALSE FALSE 
5   FALSE FALSE FALSE  TRUE FALSE 

Could someone please help me to figure out how I can achieve this.

3
Your sample data code is incomplete.Maurits Evers
Your matrix( doesn't make sense. Just use cbind()John Coleman

3 Answers

2
votes

Here is an option

# Courtesy of Hadley (avoids the "surprise" sample result when we have only one element)
# [http://r.789695.n4.nabble.com/using-quot-sample-quot-for-a-vector-of-length-1-td2299330.html]
resample <- function(x, ...) x[sample.int(length(x), ...)]

set.seed(2019)
t(apply(dataset, 1, function(x) 
    replace(rep(FALSE, length(x)), resample(which(x == 1), 1), TRUE)))
#      [,1]  [,2]  [,3]  [,4]  [,5]
#[1,] FALSE FALSE FALSE FALSE  TRUE
#[2,] FALSE  TRUE FALSE FALSE FALSE
#[3,] FALSE FALSE FALSE  TRUE FALSE
#[4,] FALSE FALSE FALSE  TRUE FALSE
#[5,] FALSE FALSE FALSE  TRUE FALSE

I've added a fixed random seed for reproducibility; remove to randomly sample 1s from every row of dataset.


Sample data

dataset=matrix(
    c(1,1,2,2,0,1,1,2,0,0,0,0,0,1,0,0,0,1,1,1,1,2,3,4,0),
    nrow = 5, ncol = 5)
dataset
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    1    0    0    1
#[2,]    1    1    0    0    2
#[3,]    2    2    0    1    3
#[4,]    2    0    1    1    4
#[5,]    0    0    0    1    0
1
votes

If I understand the request, then this should be an efficient answer:

(dataset==1) * rbinom(length(dataset), 1, 0.5)

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    1
[2,]    0    1    0    0    0
[3,]    0    0    0    1    0
[4,]    0    0    0    1    0
[5,]    0    0    0    1    0

My understanding was that you only wanted TRUE (or equyivalently 1) in the same position as the 1's in the original matrix, but only some random sample of those were to be TRUE (or 1)

1
votes

I'd approach this by making a big list of all the cells equal to 1, then just sampling one for each row and updating a copy of the matrix. Like so:

idx <- which(dataset==1, arr.ind=TRUE)
idx <- idx[sample(nrow(idx)),]
idx <- idx[!duplicated(idx[,"row"]),]
mat <- matrix(FALSE, nrow=nrow(dataset), ncol=ncol(dataset))
mat[idx] <- TRUE

mat
#      [,1]  [,2]  [,3]  [,4]  [,5]
#[1,] FALSE  TRUE FALSE FALSE FALSE
#[2,]  TRUE FALSE FALSE FALSE FALSE
#[3,] FALSE FALSE FALSE  TRUE FALSE
#[4,] FALSE FALSE  TRUE FALSE FALSE
#[5,] FALSE FALSE FALSE  TRUE FALSE

This will scale quite well too. Here's 5 million rows processed in about ~2.5 seconds:

dataset <- dataset[rep(1:5,1e6),]
system.time({
idx <- which(dataset==1, arr.ind=TRUE)
idx <- idx[sample(nrow(idx)),]
idx <- idx[!duplicated(idx[,"row"]),]
mat <- matrix(FALSE, nrow=nrow(dataset), ncol=ncol(dataset))
mat[idx] <- TRUE
})
#   user  system elapsed 
#   2.32    0.22    2.58