I'm having trouble using group_by() on multiple columns. An example dataset is the following:
dput(test)
structure(list(timestamp = structure(c(1506676980, 1506676980,
1506676980, 1506677040, 1506677280, 1506677340, 1506677460), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), plusminus = c(-1, 1, 1, 1, 1, 1, -1
), AP = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 2L), .Label = c("A",
"B"), class = "factor")), .Names = c("timestamp", "plusminus",
"AP"), row.names = c(NA, -7L), class = "data.frame")
It looks as follows:
timestamp plusminus AP
1 2017-09-29 09:23:00 -1 A
2 2017-09-29 09:23:00 1 B
3 2017-09-29 09:23:00 1 B
4 2017-09-29 09:24:00 1 B
5 2017-09-29 09:28:00 1 B
6 2017-09-29 09:29:00 1 A
7 2017-09-29 09:31:00 -1 B
I would like to do the following:
- compute a running total for each level in the 'AP' variable
- to aggregate for each minute the maximum value of the running total.
In other words, I want this output:
timestamp total AP
1 2017-09-29 09:23:00 -1 A
2 2017-09-29 09:23:00 2 B
3 2017-09-29 09:24:00 3 B
4 2017-09-29 09:28:00 4 B
5 2017-09-29 09:29:00 0 A
6 2017-09-29 09:31:00 3 B
It's easy to do part 1 via:
test %>% group_by(AP) %>% mutate(total = cumsum(plusminus))
which gives:
# A tibble: 7 x 4
# Groups: AP [2]
timestamp plusminus AP total
<dttm> <dbl> <fctr> <dbl>
1 2017-09-29 09:23:00 -1 A -1
2 2017-09-29 09:23:00 1 B 1
3 2017-09-29 09:23:00 1 B 2
4 2017-09-29 09:24:00 1 B 3
5 2017-09-29 09:28:00 1 B 4
6 2017-09-29 09:29:00 1 A 0
7 2017-09-29 09:31:00 -1 B 3
but I'm not sure how to do part 2. That is, I would like to know how to perform the aggregation such that the second row in the latter dataframe is surpressed to give the desired output.