1
votes

I have a function with two arguments. The first argument takes vector, and the second argument takes a scalar. I want to apply this function to each row of a matrix, but this function takes different second argument every time. I tried the following, it didn't work. I expected to calculate the p.value for each row and then divide the p.value by the row number. I expected the result to be a vector, but I got a matrix instead. This is a pseudo example, but it illustrates my purpose.

> foo = matrix(rnorm(100),ncol=20)
> f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
> goo = 1:5
> apply(foo,1,f,y=goo)
          [,1]      [,2]      [,3]       [,4]       [,5]
[1,] 0.9406881 0.6134117 0.5484542 0.11299535 0.20420786
[2,] 0.4703440 0.3067059 0.2742271 0.05649767 0.10210393
[3,] 0.3135627 0.2044706 0.1828181 0.03766512 0.06806929
[4,] 0.2351720 0.1533529 0.1371135 0.02824884 0.05105196
[5,] 0.1881376 0.1226823 0.1096908 0.02259907 0.04084157

The following for loop strategy produces the expected result, expect would be very slow for the real data.

> res = numeric(5)
> for (i in 1:5){
    res[i]=f(foo[i,],i)
    }
> res
[1] 0.94068810 0.30670585 0.18281807 0.02824884 0.04084157

Any suggestions would be appreciated!

2
there might be a mapply-based solution.Ben Bolker
e.g. (inefficient!) mapply(f,split(foo,row(foo)),goo)Ben Bolker
I think "mapply(f,split(foo,row(foo)),goo)" should work. Why do you say it is inefficient? ThanksLi_Q

2 Answers

2
votes

If your real purpose is like your example, you can vectorize the division:

f <- function(x) t.test(x[1:10], x[11:20])$p.value
apply(foo, 1, f) / goo

Based on the comment, the above is not appropriate.

In the case of the example, you might observe that the diagonal of the returned matrix is the desired result:

f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
goo = 1:5
diag(apply(foo,1,f,y=goo))

Besides being inefficient in time or space, this suffers from another problem. It is a result of the operation on y being vectorized that this is correct for the example. And in that case, the former solution is better. So I suspect that in your actual problem, your operation is not vectorized.

Sometimes a for loop really is the best answer. The apply family of functions are not magical; they are still loops.

Here is an sapply solution. It won't beat for for time (probably won't lose either) but it doesn't have a high space overhead. The idea is to apply the row index and use that to extract the row of foo and the element of goo to pass to f

sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i]))
1
votes
f <- function (x,y) t.test(x[1:10],x[11:20])$p.value/y
f2 <- function(a, b){
    tt <- t.test(x = a[1:10], y = a[11:20])$p.value
    tt/b
}
f3 <- function() {
  res <- numeric(5)
  for (i in 1:5){
      res[i] <- f(foo[i,],i)
  }
  res
}
f4 <- function(x) t.test(x[1:10], x[11:20])$p.value

set.seed(101)
foo <- matrix(rnorm(100),ncol=20)
goo <- 1:5
library(rbenchmark)
benchmark(
     apply(foo, 1, f4) / goo,
     mapply(f,split(foo,row(foo)),goo),
     f2(foo,goo),
     f3(),replications=1000,
     sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i])),
     columns=c("test","replications","elapsed","relative"))

##                     test replications elapsed  relative
## 1   apply(foo, 1, f4)/goo         1000   1.581     5.528
## 3            f2(foo, goo)         1000   0.286     1.000
## 4                    f3()         1000   1.458     5.098
## 2             mapply(...)         1000   1.599     5.591
## 5             sapply(...)         1000   1.486     5.196

The direct division is best (but not actually applicable); for this example there's not much difference between the other solutions, but for loop is better than sapply which is better than mapply. You should try this on a more realistic example to see how it's going to scale for your problem.