R: summarise multiple columns with different summation functions using dplyr results in error?

Question

I am transforming a customer journey dataset from user aggregation level to a day level aggregation. The problem is that I cannot simply sum or mean all columns, as not all variables can be aggregated in the same way. For example, duration is a variable that I want to summarise via mean, while purchase_own is a variable that I want to summarise via sum.

I used dplyr to get this working, but it gives me an error. I tried the following code:

CJd <- CJre %>% group_by(date) %>% summarise_at(vars(purchase_own, purchase_any, CIT, 
FIT, T1:T22, devicemobile, devicefixed, purchase_comp, POS_comp, POS_own, POS_any, 
markov, first_touch, last_touch, linear_touch), sum) 
%>% summarise_at(vars(duration, difference), mean) %>% summarise_at(CountTP, max)

This results in an error:

Error in .f(.x[[i]], ...) : object 'duration' not found

I suspect that this means that summarise_at(vars(duration, difference), mean) is not allowed as second summarise code. Now my question is, how can I write the summarise function so that summation is different for some variables?

Actual results is that only the first summarise_at gets executed, which results in missing variables in my dataset. The missing variables need to be summarised with mean and max, respectively. The expected outcome is these variables grouped by date and summarised by the named functions mean or max are added to the dataset.

akrun akrun · Accepted Answer · 2019-04-23T12:27:15

The issue is that after the first summarise_at which didn't include 'duration', therefore, the column is not there in the summarised data. Instead, if we use mutate_at, and create a column, then get the distinct rows of the data and summarise

CJre %>%
  group_by(date) %>%
  mutate_at(vars(purchase_own, purchase_any, CIT, 
               FIT, T1:T22, devicemobile, devicefixed, purchase_comp, 
               POS_comp, POS_own, POS_any, 
              markov, first_touch, last_touch, linear_touch), sum) %>%
  group_by(purchase_own, purchase_any, CIT, 
           FIT, T1:T22, devicemobile, devicefixed, purchase_comp,
            POS_comp, POS_own, POS_any, 
            markov, first_touch, last_touch, linear_touch, add = TRUE) %>%
  summarise_at(vars(duration, difference), mean)

markov, first_touch, last_touch, linear_touch), sum)

R: summarise multiple columns with different summation functions using dplyr results in error?

1 Answers