Following is what my dataframe/data.table looks like. The rank
column is my desired calculated field.
library(data.table)
df <- fread('
Name Score Date Rank
John 42 1/1/2018 3
Rob 85 12/31/2017 2
Rob 89 12/26/2017 1
Rob 57 12/24/2017 1
Rob 53 08/31/2017 1
Rob 72 05/31/2017 2
Kate 87 12/25/2017 1
Kate 73 05/15/2017 1
')
df[,Date:= as.Date(Date, format="%m/%d/%Y")]
I am trying to calculate the rank of each student at every given point in time in the data within a 30 day windows. For that, I need to fetch the most recent scores of all students at a given point in time and then pass the rank function.
In the 1st row, as of 1/1/2018
, John
has two more competitors in a past 30 day window: Rob with the most recent score of 85
in 12/31/2017
AND Kate with the most recent score of 87
in 12/25/2017
and both of these dates fall within the 1/1/2018 - 30
Day Window. John gets a rank of 3
with the lowest score of 42
. If only one students falls within date(at a given row) - 30 day window
, then the rank is 1.
In the 3rd row the date is 12/26/2017
. So Rob's score as of 12/26/2017
is 89
. There is only one case of another student that falls in the time window of 12/26/2017 - 30
days and that is the most recent score(87
) of kate on 12/25/2017
. So within the time window of (12/26/2017) - 30
, Rob's score of 89
is higher than Kate's score of 87
and therefore Rob gets rank 1
.
I was thinking about using the framework from here Efficient way to perform running total in the last 365 day window but struggling to think of a way to fetch all recent score of all students at a given point in time before using rank.