Observations in my data are contained in groups, and I'm trying to get multiple summary statistics (e.g., mean, median, length, standard deviation) for each group using the summarize_all function.
The problem is that some functions (e.g., mean, median) require the na.rm=T argument, while others do not (e.g., n()). When I specify na.rm=T in summarize_all, it applies the na.rm argument to each function listed (below, mean and sd).
library(dplyr)
airquality %>%
select(Month, Ozone, Solar.R, Temp) %>%
group_by(Month) %>%
summarize_all(list(mean, sd), na.rm=T)
BUT, when it also applies it to n() when I include that function, which gives me the error: "Error: Evaluation error: unused arguments (Ozone, na.rm = TRUE)"
airquality %>%
select(Month, Ozone, Solar.R, Temp) %>%
group_by(Month) %>%
summarize_all(list(mean, sd, n), na.rm=T)
I'd also love to know how to get rid of the terrible column names that summarize_all creates when using more than one function. For example, in the first chunk of code I get column names like mpg_<S4: standardGeneric>
and cyl_<S4: standardGeneric>
na.rm
? if you take out the na.rm you still get that ozone error. I think the issue is that mean and sd take input arguments, whereas n() doesn't have any inputs. - Jacqueline Nolis