As described in numerous questions on here, I should be able to take a data.frame, group it, sort by date, and then apply cumsum, to get the cumulative sum over time per grouping.
Instead, with dplyr 0.8.0, I'm getting cumulative sums that ignore the grouping.
Example code:
data.frame(
cat = sample(c("a", "b", "c"), size = 1000, replace = T),
date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1000, replace=T)
) %>%
mutate(
x = 1
) %>%
arrange(date) %>%
group_by(cat) %>%
mutate(x = cumsum(x)) %>%
tail()
Now, I'd expect the last few rows to have x equal to around 300-something, for each group.
Instead I get:
# A tibble: 6 x 3
# Groups: cat [2]
cat date x
<chr> <date> <dbl>
1 a 1999-12-31 995
2 a 1999-12-31 996
3 c 2000-01-01 997
4 a 2000-01-01 998
5 c 2000-01-01 999
6 a 2000-01-01 1000
What am I doing wrong?
xvalues are roughly around 300. - coffeinjunkydplyr0.7.2. - coffeinjunkydplyr0.8.0? A part of me will feel better if its a reversion... - Bobset.seed. (I suggest you could demonstrate grouping problem without generating 1000 randoms, such asdata_frame(a=rep(1:2,2),b=1:4) %>% group_by(a) %>% mutate(x=cumsum(b)), expecting 1,2,4,6.) - r2evans