Using apply in R with an additional vector argument

Question

I have a matrix of size 10000 x 100 and a vector of length 100. I'd like to apply a custom function, percentile, which takes in a vector argument and a scalar argument, to each column of the matrix such that on iteration j, the arguments used with percentile are column j of the matrix and entry j of the vector. Is there a way to use one of the apply functions to do this?

Here's my code. It runs, but doesn't return the correct result.

percentile <- function(x, v){
  length(x[x <= v]) / length(x)
}

X <- matrix(runif(10000 * 100), nrow = 10000, ncol = 100)
y <- runif(100)
result <- apply(X, 2, percentile, v = y)

The workaround that I've been using has been to just append y to X, and re-write the percentile function, as shown below.

X <- rbind(X, y)
percentile2 <- function(x){
  v <- x[length(x)]
  x <- x[-length(x)]
  length(x[x <= v]) / length(x)
}
result <- apply(X, 2, percentile2)

This code does return the correct result, but I would prefer something a bit more elegant.

Hi, welcome to SO. Since you are quite new here, you might want to read the about and FAQ sections of the website to help you get the most out of it. If an answer does solve your problem you may want to consider upvoting and/or marking it as accepted to show the question has been answered, by ticking the little green check mark next to the suitable answer. You are not obliged to do this, but it helps keep the site clean of unanswered questions and rewards those who take the time to solve your problem. — Simon O'Hanlon

Simon O'Hanlon Simon O'Hanlon · Accepted Answer · 2013-08-23T09:55:53

If you understand that R is vectorised and know the right functions you can avoid loops entirely, and do the whole thing in one relatively simple line...

 colSums(  t( t( X ) <= y ) ) / nrow( X )

Through vectorisation R will recycle each element in y across each column of X (by default it will do this across the rows, so we use the transpose function t to turn the columns to rows, apply the logical comparison <= and then transpose back again.

Since TRUE and FALSE evaluate to 1 and 0 respectively we can use colSums to effectively get the number of rows in each column which met the condition and then divde each column by the total number of rows (remember the recycling rule!). It is the exact same result....

res1 <- apply(X2, 2, percentile2)
res2 <- colSums(  t( t( X ) <= y ) ) / nrow( X )
identical( res1 , res2 )
[1] TRUE

Obviously as this doesn't use any R loops it's a lot quicker (~10 times on this small matrix).

Even better would be to use rowMeans like this (thanks to @flodel):

     rowMeans(  t(X) <= y  )

Using apply in R with an additional vector argument

2 Answers