1
votes

I am using dplyr summarise function. My data contain NAs so I need to include na.rm=TRUE for each call. for example:

group <- rep(c('a', 'b'), 3)
value <- c(1:4, NA, NA)
df = data.frame(group, value)

library(dplyr)
group_by(df, group) %>% summarise(

          mean = mean(value, na.rm=TRUE),

          sd = sd(value, na.rm=TRUE),

          min = min(value, na.rm=TRUE))

Is there a way to write the argument na.rm=TRUE only one time, and not on each row?

2
You can use na.omit: df %>% group_by(group) %>% na.omit() %>% summarise()pogibas
na.omit() will delete the entire row with al least one NA. I don't want that.Rtist

2 Answers

4
votes

You should use summarise_at, which lets you compute multiple functions for the supplied columns and set arguments that are shared among them:

df %>% group_by(group) %>% 
  summarise_at("value", 
               funs(mean = mean, sd = sd, min = min), 
               na.rm = TRUE)
1
votes

If you're planning to apply your functions to one column only, you can use filter(!is.na()) in order to filter out any NA values of this variable only (i.e. NA in other variables won't affect the process).

group <- rep(c('a', 'b'), 3)
value <- c(1:4, NA, NA)
df = data.frame(group, value)

library(dplyr)

group_by(df, group) %>% 
  filter(!is.na(value)) %>%
  summarise(mean = mean(value),
            sd = sd(value),
            min = min(value))

# # A tibble: 2 x 4
#    group  mean       sd   min
#   <fctr> <dbl>    <dbl> <dbl>
# 1      a     2 1.414214     1
# 2      b     3 1.414214     2