2
votes

Say, I have a data frame. In my application, dimensions and column names of this data frame are a priori unknown, but for example:

  v1 <- sample(1:100, 5, replace=F)
  v2 <- sample(1:100, 5, replace=F)
  v3 <- sample(1:100, 5, replace=F)

  sample_matrix <- data.frame(v1, v2, v3)

I want to apply a function for each row of sample_df. Function is, in fact, also unknown, except that it returns a vector. As a result of the apply operation, I need to have a data frame with the same number of rows.

If a function returns a vector longer then 1, results of apply are combined as columns, not as rows:

  dummy_func1 <- function(x) c(1, 2)
  apply(sample_matrix, 1, dummy_func1)

    X1 X2 X3 X4 X5
  1  1  1  1  1  1
  2  2  2  2  2  2

If I know in advance, that function returns more then 1 argument, it can be dealt with transpose:

  data.frame(t(apply(sample_matrix, 1, dummy_func1)))

    X1 X2
  1  1  2
  2  1  2
  3  1  2
  4  1  2
  5  1  2

However, if the function returns exactly 1 argument, it does the opposite from what is needed:

  dummy_func2 <- function(x) c(1)
  data.frame(t(apply(sample_matrix, 1, dummy_func2))

    X1 X2 X3 X4 X5
  1  1  1  1  1  1

Currently, what I do in my project, is transpose conditionally, which is kinda ugly:

  res <- data.frame(apply(sample_matrix, 1, dummy_func2))
  if(ncol(res) > 1) res <- t(res)

The answers I find mostly suggest to use plyr, but I think I cannot use plyr (or can I?), because in my project neither the data frame nor function are known in advance.

My question is, what is the better way instead of using vanilla apply, to have results always be combined into columns, regardless of return length.

1

1 Answers

1
votes

you can use lapply() so you always get a list of results:

sample_matrix<-t(sample_matrix)

dummy_func1 <- function(x) c(1, 2)
a <- lapply(as.data.frame(sample_matrix) ,  dummy_func1)

t(data.frame(a))
       [,1] [,2]
    v1    1    2
    v2    1    2
    v3    1    2

dummy_func2 <- function(x) c(1)
b<- lapply(as.data.frame(sample_matrix),  dummy_func2)

t(data.frame(b))
       [,1]
    v1    1
    v2    1
    v3    1