I am transforming a customer journey dataset from user aggregation level to a day level aggregation. The problem is that I cannot simply sum or mean all columns, as not all variables can be aggregated in the same way. For example, duration
is a variable that I want to summarise via mean, while purchase_own
is a variable that I want to summarise via sum.
I used dplyr
to get this working, but it gives me an error. I tried the following code:
CJd <- CJre %>% group_by(date) %>% summarise_at(vars(purchase_own, purchase_any, CIT,
FIT, T1:T22, devicemobile, devicefixed, purchase_comp, POS_comp, POS_own, POS_any,
markov, first_touch, last_touch, linear_touch), sum)
%>% summarise_at(vars(duration, difference), mean) %>% summarise_at(CountTP, max)
This results in an error:
Error in .f(.x[[i]], ...) : object 'duration' not found
I suspect that this means that summarise_at(vars(duration, difference), mean)
is not allowed as second summarise code. Now my question is, how can I write the summarise function so that summation is different for some variables?
Actual results is that only the first summarise_at
gets executed, which results in missing variables in my dataset. The missing variables need to be summarised with mean
and max
, respectively. The expected outcome is these variables grouped by date
and summarised by the named functions mean or max are added to the dataset.