0
votes

I have written a function to extract data from a large matrix ("c.mat") for each row in a data.frame ("df.1"). The data.frame has an indexing row in it ("df.1$hour") which corresponds to the relevant column in the matrix. There is an equal number of rows between the matrix and the data.frame so the function goes:

assignUV.FUN <- function(df, mat){
  num=df$hour
  value = mat[as.numeric(rownames(df)),num]
  return(value)
}

Quite simple. However when using apply to run this for each row:

df.1 <- data.frame(hour= round(runif(10,1,100)), x = seq(1,10, length=10))
c.mat <- matrix(runif(1000,1,5), nrow=10)

try <- apply(df.1, 1, assignUV.FUN, mat = c.mat, df=df.1)

I get the error:

Error in FUN(newX[, i], ...) : unused argument (newX[, i])

I'm sure there is a conflict here whereby I am calling the data.frame twice, once from inside the assignUV.FUN function and once with apply but I can;t figure out why this won't work.

Any thoughts? It works fine if I just run on a single row:

assignUV.FUN(df = df.1[1,], mat=c.mat)
2

2 Answers

3
votes

If I have understood you correctly, you want to subset c.mat for every row in df.1 based on value in hour column. I don't think apply is the best choice here since you want to subset it by both row and column index. apply passes the value of the row and not it's index which you need for subsetting. One option from the apply family is to use mapply

mapply(function(x, y) c.mat[x, y], seq_len(nrow(df.1)), df.1$hour)
#[1] 2.472 3.980 3.654 4.868 4.204 3.320 4.191 3.296 1.016 4.353

Or a vectorised approach would be

c.mat[cbind(1:nrow(df.1), df.1$hour)]
#[1] 2.472 3.980 3.654 4.868 4.204 3.320 4.191 3.296 1.016 4.353

To get into details of why it works when you apply for one row individually and not when you use apply notice when you subset one row it is still a one row dataframe with header.

df.1[1, ]
#  hour x
#1   31 1

class(df.1[1, ])
#[1] "data.frame"

So when you do df$hour in the assignUV.FUN function you will get a value

df.1[1, ]$hour
#[1] 31

However, that is not the case with apply

apply(df.1[1, ], 1, class)
#        1 
#"numeric" 

and if you try to extract the value

apply(df.1[1, ], 1, function(x) x$hour)

Error in x$hour : $ operator is invalid for atomic vectors

You can solve the above issue by using position instead of name by doing

apply(df.1[1, ], 1, function(x) x[1])
#31 

but this gives you column to subset from c.mat and not the row.

data

set.seed(100)
df.1 <- data.frame(hour= round(runif(10,1,100)), x = seq(1,10, length=10))
c.mat <- matrix(runif(1000,1,5), nrow=10)
1
votes

Ronak's vector approach is the way to go, but I hope this can be instructive. apply does not pass a data.frame to FUN, but instead a vector, so you can try,

assignUV.FUN <- function(DF, mat){
  num=DF[1]
  value = mat[DF[2],num]
  return(value)
}

try <- apply(df.1, 1, assignUV.FUN, mat = c.mat)

It is unnecessary to pass all arguments of FUN to apply, I only passed c.mat since it is not being 'looped through'. Also, I try to avoid naming data.frames df, since R already has a function called df (density of the F distribution).