pass grouped dataframe to own function in dplyr

Question

I am trying to transfer from plyr to dplyr. However, I still can't seem to figure out how to call on own functions in a chained dplyr function.

I have a data frame with a factorised ID variable and an order variable. I want to split the frame by the ID, order it by the order variable and add a sequence in a new column.

My plyr functions looks like this:

f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1))
data <- ddply(data, .(ID_variable), f)

In dplyr I though this should look something like this

f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1))
data <- data %>% group_by(ID_variable) %>% f

Can anyone tell me how to modify my dplyr call to successfully pass my own function and get the same functionality my plyr function provides?

EDIT: If I use the dplyr formula as described here, it DOES pass an object to f. However, while plyr seems to pass a number of different tables (split by the ID variable), dplyr does not pass one table per group but the ENTIRE table (as some kind of dplyr object where groups are annotated), thus when I cbind the Experience variable it appends a counter from 0 to the length of the entire table instead of the single groups.

I have found a way to get the same functionality in dplyr using this approach:

data <- data %>%
    group_by(ID_variable) %>%
    arrange(ID_variable,order_variable) %>% 
    mutate(Experience = 0:(n()-1))

However, I would still be keen to learn how to pass grouped variables split into different tables to own functions in dplyr.

What version of R and dplyr are you using? This did not produce an error for me. — nrussell
I think we need a reproducible data set. @nrussell I don't think the issue is an error thrown, just not the intended/expected result. I found this question due to my own similar issue. I did something very similar d %>% group_by(var1, var2) %>% summarize(blah = f(.)). I get a group data frame returned, but each entry for ` blah` is identical. I think it's as described above; the whole df is passed for some reason, not the grouped "chunks" like plyr would do. — Hendy

Kipras Kančys Kipras Kančys · Accepted Answer · 2017-12-11T19:17:34

For those who get here from google. Let's say you wrote your own print function.

printFunction <- function(dat) print(dat)
df <- data.frame(a = 1:6, b = 1:2)

As it was asked here

df %>% 
    group_by(b) %>% 
    printFunction(.)

prints entire data. To get dplyr print multiple tables grouped by, you should use do

df %>% 
    group_by(b) %>% 
    do(printFunction(.))

pass grouped dataframe to own function in dplyr

1 Answers