What is the preferred way to send all columns within a current group to a function as a tibble or data.frame when calling an arbitrary function in a dplyr pipe?
In the example below, mean_B
is a simple example where I know what is needed before I make the function call. mean_B_fun
gives the wrong answer (compared to what I want-- I want the within-group mean), and mean_B_fun_ugly
gives what I want, but it seems like both an inefficient (and ugly) way to get the effect I want.
The reason I want to operate on arbitrary columns is that in practice, I'm taking my_fun
in the example below from the user, and I don't know the columns that the user will need to operate on a priori.
library(dplyr)
my_fun <- function(x) mean(x$B)
my_data <-
expand.grid(A=1:3, B=1:2) %>%
mutate(B=A*B) %>%
group_by(A) %>%
mutate(mean_B=mean(B),
mean_B_fun=my_fun(.),
mean_B_fun_ugly=my_fun(as.data.frame(.)[.$A == unique(A),,drop=FALSE]))
mutate_all
will apply a function, by group, to all columns other than the grouping columns. Formy_fun
, the argumentx
should be a vector and the operation in the function would bemean(x)
, since mutate will pass a vector of values from a given column. – eipi10tidyverse
: 1) those that take a dataframe as a first argument (used in pipes, e.g. tidyr::separate or dplyr::top_n) and 2) those that take vectors (e.g. all functions instringr
or manybase
functions, such asmean
,max
,sum
) - these are typically used inmutate
statements. There are some that can take either df or vector (likepurrr::map
), but the behaviour will be different. Your user-function should be type 2 - it should take a vector, not a dataframe. Assuming user does not subset inside the function,group_by
will be honored. – dmi3knomean(x$B)
could alternatively bemean(x$B) + mean(x$A)
, and I wouldn't know which columns they need. – Bill DenneyA
, and in the next function call they may need columnB
, and in the next function call they may need bothA
andB
. More generally, I don't know all the column names in the user's dataset, what they will mean to the user, and which will be important. – Bill Denney