1
votes

I have a dataframe with multiple columns. For each column in the dataframe, I want to call a function on the column, and the input of the function is using the remained columns in the dataframe. For example, let's say I have this data and this testFunc which accepts two args:

> df <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
> df
  x y z
1 1 3 5
2 2 4 6
> 
> testfun <- function(a, b){colMeans(a + 2 * b)} # only for illustation

Let's say I want to apply this testFunc to loop all the columns. Here is the loop to get a result.

> for (i in 1:nrow(df)) {
+   Y = matrix(df[, i], ncol = 1)
+   Xmat = df[, -i]
+   result[i, -i] = testfun(Y, Xmat)
+ }  
> 
> result
     [,1] [,2] [,3]
[1,]  0.0  8.5 12.5
[2,]  6.5  0.0 14.5
[3,]  0.0  0.0  0.0

Is there a way to do this without writing a for loop, maybe with the apply function family? Thank you so much.

1
I think your indexing should be 1:ncol(df))akrun

1 Answers

0
votes

We could loop over the sequence of columns of dataset in sapply/lapply, extract the column of dataset with that index for the Y and the remaining columns with - on the index, apply the testfun, assign an already initialized numeric vector (of same length as number of columns of dataset) based on the index (-i), return the vector and transpose the output of sapply

v1 <- numeric(ncol(df));
t(sapply(seq_along(df), function(i)  {
       v1[-i] <- testfun(as.matrix(df[i]), df[-i])
        v1
       }))

-output

#      [,1] [,2] [,3]
#[1,]  0.0  8.5 12.5
#[2,]  6.5  0.0 14.5
#[3,]  8.5 12.5  0.0

Or this can be done with tidyverse

library(dplyr)
df %>%
   summarise(across(everything(), ~ testfun(., select(df, -cur_column()))))
#    x    y    z
#1  8.5  6.5  8.5
#2 12.5 14.5 12.5