The code below does exactly what I want it to do. Takes a data frame with multiple grouping constraints (ie. id and date) and returns 1 row for each with different functions applied to different columns. In this case, collapsing to character and taking the maximum of the column by grouping.
The issue is scaling this approach to much bigger data frames with 100k rows and groups. How can I reach the same conclusion in a more efficient manner? If this is possible in a tidyverse structure this is preferred (while not preferred, data.table solutions will also be accepted). I want to keep the flexibility to specify the different cols to be included in the function calls, as well as add additional summary calls (e.g. summarise_at(sum of values)).
data <- tibble(id = c(1,1,1,2,3,4,5,5,6,6,6), date = dmy("01/01/2020"), var1 = 1:11, var2 = 12:22, var3 = 1:11)
data %>%
group_by(id, date) %>%
{data.frame(
summarise_at(., vars(var1, var2), list(~ paste(unique(.), collapse = " AND "))), # return character string
summarise_at(., vars(var3), list(~ max(., na.rm = T))),#, # return max in group
summarise(., count = n(), .groups = "keep") # return count of cases in group
)} %>%
select(-matches("[.]1$|[.]2$|[.]3$")) %>% # remove unwanted columns
as_tibble()
# A tibble: 6 x 6
id date var1 var2 var3 count
<dbl> <date> <chr> <chr> <int> <int>
1 1 2020-01-01 1 AND 2 AND 3 12 AND 13 AND 14 3 3
2 2 2020-01-01 4 15 4 1
3 3 2020-01-01 5 16 5 1
4 4 2020-01-01 6 17 6 1
5 5 2020-01-01 7 AND 8 18 AND 19 8 2
6 6 2020-01-01 9 AND 10 AND 11 20 AND 21 AND 22 11 3
Links to this question as well. How can I use summarise_at to apply different functions to different columns?