Trying to get my head around this dplyr
thingy. I have a sorted data frame that I want to group based on a variable. However, the groups need to be constructed so that each of them have a minimum sum of 30 on the grouping variable.
Consider this small example data frame:
df1 <- matrix(data = c(05,0.9,95,12,0.8,31,
16,0.8,28,17,0.7,10,
23,0.8,11,55,0.6,9,
56,0.5,12,57,0.2,1,
59,0.4,1),
ncol = 3,
byrow = TRUE,
dimnames = list(c(1:9),
c('freq', 'mean', 'count')
)
)
Now, I want to group so that count
have a sum of at least 30. freq
and mean
should then be collapsed into a weighted.mean
where the weights is the count
values. Note that the last "bin" reaches a sum of 32 by row 7, but since row 8:9 only sums to 2, I add them to the last "bin".
Like so:
freq mean count
5.00 0.90 95
12.00 0.80 31
16.26 0.77 38
45.18 0.61 34
The simple summarizing with dplyr
is not a problem, but this I can't figure out. I do think the the solution is hidden somewhere here:
Dynamic Grouping in R | Grouping based on condition on applied function
But how to apply it to my situation escapes me.