I could do this by looping through my dataset multiple times but thought there must be a more efficient way to do this through data.table. This is what the dataset looks like:
CaseID Won OwnerID Time_period Finished
1 yes A 1 no
1 yes A 3 no
1 yes A 5 yes
2 no A 4 no
2 no A 6 yes
3 yes A 2 yes
4 no A 3 yes
5 15 B 2 no
For each row, by owner, I want to generate an average of the amount of cases finished before that time period that are won.
CaseID Won OwnerID Time_period Finished AvgWonByOwner
1 yes A 1 no NA
1 yes A 3 no 1
1 yes A 5 yes .5
2 no A 4 no .5
2 no A 6 yes 2/3
3 yes A 2 yes NA
4 no A 3 yes 1
5 15 B 2 no NA
Looking at this in detail, it seems ridiculously complicated. I thought you might be able to do this with some sort of rolling merge, but I don't know how to set a condition where the average is only calculated from Won before the date of the row and where it has to have the same ownerID.
Edit 1: Explanation for numbers in final column
AvgWonByOwner Explanation
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
.5 t = 5, case 3 finished, won; case 4 finished lost; average = .5
.5 t = 4, case 3 finished, won; case 4 finished lost; average = .5
2/3 t = 6, case 3 finished, won, case 4 finished lost, case 1 finished won, average: 2/3
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
NA t = 1, No cases finished yet, this could be 0 too