I have the following table:
day_1 = as.Date('2020-03-02')
day_2 = as.Date('2021-03-02')
day_3 = as.Date('2021-02-02')
data_example <- data.frame(pay_date = c(rep(day_1,4),rep(day_2,3),rep(day_3,3)),
arrears = c(0,0:2,rep(0:2,2)),
to_pay = c(2:5,2:4,4:6),
paid = c(2:5,2,3,0,4,0,0))
The data is:
> data_example
pay_date arrears to_pay paid
1 2020-03-02 0 2 2
2 2020-03-02 0 3 3
3 2020-03-02 1 4 4
4 2020-03-02 2 5 5
5 2021-03-02 0 2 2
6 2021-03-02 1 3 3
7 2021-03-02 2 4 0
8 2021-02-02 0 4 4
9 2021-02-02 1 5 0
10 2021-02-02 2 6 0
I want to aggregate the data by pay_date and arrears, and accumulate the amounts to_pay (in col1) and paid (in col2). Then I want to create col 3, which is the cumulate sum of col 1
. For example, for '2020-03-02', for arrears 0, the cumulative sum is 5. For arrears 1, the cumulative sum is 5 (from arrears 0) + 4 (from arrears 1) = 9. For arrears 2, the cumulative sum is 5 + 4 + 5 (from arrears 2:
pay_date arrears col1 col2 col3
* <date> <dbl> <int> <dbl> <int>
1 2020-03-02 0 5 5 5
2 2020-03-02 1 4 4 9
3 2020-03-02 2 5 5 14
The next calculation is col 4, which is the cumulative sum of col 2.
Col 5 is the total sum of to_pay
by pay_date
.
The payment performance is col4/col5. This is the expected result:
pay_date arrears col1 col2 col3 col4 col5 `Payment performance`
* <date> <dbl> <int> <dbl> <int> <dbl> <int> <dbl>
1 2020-03-02 0 5 5 5 5 14 0.36
2 2020-03-02 1 4 4 9 9 14 0.64
3 2020-03-02 2 5 5 14 14 14 1.00
4 2021-02-02 0 4 4 4 4 15 0.27
5 2021-02-02 1 5 0 9 4 15 0.27
6 2021-02-02 2 6 0 15 4 15 0.27
7 2021-03-02 0 2 2 2 2 9 0.22
8 2021-03-02 1 3 3 5 5 9 0.56
9 2021-03-02 2 4 0 9 5 9 0.56
I tried this code but I failed to group the sums:
data_example %>%
group_by(pay_date, arrears) %>%
summarise(col1 = sum(to_pay),
col2 = sum(paid),
.groups = "drop") %>%
mutate(col3 = cumsum(col1),
col4 = cumsum(col2),
col5 = sum(col1),
`Payment performance` = round(col4/col5, digits = 2))
Please, can you help me? In this case my problem is cumsum, because it sums all the values in the column, but I want to sum it by pay_date
. Any help will be greatly appreciated.
groups = "drop"
, but then it sounds like you want to do some sort of aggregation. It's not clear from your question how you want col3 to be summed. Perhaps you could add some clarification? – Ian Campbell