3
votes

Assume I have a vector of values like:

M=3;val<-rnorm(M)

and the corresponding index matrix like:

N=20;J=10;ind<-matrix(sample(1:M,N*J,replace=T),nrow=J)

I can easily assign the values with their indices to get the data matrix as:

x<-matrix(val[ind],J,N)

Now I have a matrix of values like:

val<-matrix(rnorm(M*J),nrow=J)

and need to assign the values and indices row by row (i.e., one row in val with one row in ind) to get the data matrix.

I can do this with a for loop as:

x<-ind;
for(j in 1:J){x[j,]<-val[j,ind[j,]]}

But I wonder if there's more efficient way to do this, especially to avoid using the for loop?

I need to do resampling and repeat the assigning process hundreds of thousands of times. So I worry that the for loop will take up a lot of time.

2
I am slightly uncertain as to 'why' this is. Are you perhaps simply looking for matrix(..., byrow = FALSE), which would fill in the matrix by column. This should let you fill in the matrix by column in the matrix call directly.Oliver
Sorry there are typos. But the issue is not on x, but to apply val and ind row by row, as shown in the for loop.Luwei

2 Answers

4
votes

In general, a matrix can be subset or subset-assigned using a two-column matrix as row and column indices. So

i_idx = rep(1:J, each = ncol(ind))
x_idx = cbind(i_idx, 1:ncol(ind))
val_idx = cbind(i_idx, as.vector(t(ind[1:J,])))

x[x_idx] = val[val_idx]
1
votes

Another three methods, one using sapply, one matrix subsetting and one vector subsetting. Matrix and vector subsetting looks to be faster, the one with sapply slower, than the for loop.

Currently

matrix(val[1:J + (ind-1)*J],J,N)

looks like to be the fastest way.

M <- 3; N <- 20; J <- 10
ind <- matrix(sample(1:M,N*J,replace=T),nrow=J)
val <- matrix(rnorm(M*J),nrow=J)

x<-ind;
for(j in 1:J){x[j,]<-val[j,ind[j,]]}

identical(x, t(sapply(1:J, function(j) val[j,ind[j,]])))
#[1] TRUE

identical(x, matrix(val[matrix(c(rep(1:J, N), ind), ncol=2)],J,N))
#[1] TRUE
#Other ways for rep(1:J, N)
identical(x, matrix(val[matrix(c(row(ind), ind), ncol=2)],J,N))
#[1] TRUE
identical(x, matrix(val[matrix(c(slice.index(ind, 1), ind), ncol=2)],J,N))
#[1] TRUE

#Vector subsetting as suggested by Aaron
identical(x, matrix(val[row(ind) + (ind-1)*J],J,N))
#[1] TRUE
#Other ways
identical(x, matrix(val[1:J + (ind-1)*J],J,N))
#[1] TRUE
identical(x, matrix(val[sweep((ind-1)*J, 1, 1:J, "+")],J,N))
#[1] TRUE

Speed comparison:

library(microbenchmark)

f1 <- function() {
  x<-ind;
  for(j in 1:J){x[j,]<-val[j,ind[j,]]}
}
f2 <- function() {t(sapply(1:J, function(j) val[j,ind[j,]]))}
f3 <- function() {matrix(val[matrix(c(row(ind), ind), ncol=2)],J,N)}
f4 <- function() {matrix(val[row(ind) + (ind-1)*J],J,N)} #Comment from Aaron
f5 <- function() {matrix(val[1:J + (ind-1)*J],J,N)}

microbenchmark(f1(), f2(), f3(), f4(), f5(), setup=gc)
#Unit: microseconds
# expr    min      lq     mean  median      uq     max neval
# f1() 16.540 18.3595 20.11216 19.8820 20.7915  36.201   100
# f2() 43.514 46.3650 49.77573 48.0320 49.5120 113.631   100
# f3()  8.325  9.3265 10.38931  9.9425 10.4825  46.561   100
# f4()  6.934  7.8270  9.00286  8.4405  9.1355  25.840   100
# f5()  5.839  6.8730  7.71322  7.3520  8.3145  16.349   100