Calculating a Rolling Average with Missing Time Values

Question

I have a dataset described by the following:

> dput(droplevels(head(sample,10)))
structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Air-Force", class = "factor"), Year = c(2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2011L, 2012L, 2013L
), Grouped_Position_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Skill", class = "factor"), Avg_Rating = c(0.7667, 
0, 0.7444, 0.7222, 0, 0.7556, 0.76224, 0.596322222222222, 0.706584615384615, 
0.767509090909091), n = c(1L, 1L, 3L, 6L, 1L, 1L, 5L, 9L, 13L, 
11L)), .Names = c("Team", "Year", "Grouped_Position_3", "Avg_Rating", 
"n"), row.names = c(NA, 10L), class = "data.frame")

In the full dataset there are multiple schools, grouped positions and years. What I'm trying to do is figure out how to generate a rolling average using the current year and four years in the past for each unique group of school, year and position. For example for 2013, Air Force and Skill position I would like the following calculation to take place (Note that 2010 is missing in the data):

(.767+.70+.59+0+.762)/5

The 0 comes from the missing year. I have looked at the zoo library in combination with dplyr but I haven't been able to control for missing values like this. Am I looking at having to write a loop or is there some package in r that has this capability?

G. Grothendieck G. Grothendieck · Accepted Answer · 2018-03-01T19:35:47

Create a function Avg which given a vector of row numbers ix takes the required average of the most recent 5 years and then rollapplyr it for each group of Team and Grouped_Position_3:

library(zoo)

Avg <- function(ix) with(sample[ix, ], sum(Avg_Rating[Year >= max(Year) - 4]) / 5)
transform(sample, Avg = ave(1:nrow(sample), Team, Grouped_Position_3, FUN = 
   function(ix) rollapplyr(ix, 5, Avg, partial = TRUE)))

giving:

        Team Year Grouped_Position_3 Avg_Rating  n       Avg
1  Air-Force 2003              Skill  0.7667000  1 0.1533400
2  Air-Force 2004              Skill  0.0000000  1 0.1533400
3  Air-Force 2005              Skill  0.7444000  3 0.3022200
4  Air-Force 2006              Skill  0.7222000  6 0.4466600
5  Air-Force 2007              Skill  0.0000000  1 0.4466600
6  Air-Force 2008              Skill  0.7556000  1 0.4444400
7  Air-Force 2009              Skill  0.7622400  5 0.5968880
8  Air-Force 2011              Skill  0.5963222  9 0.4228324
9  Air-Force 2012              Skill  0.7065846 13 0.5641494
10 Air-Force 2013              Skill  0.7675091 11 0.5665312

Note

The input used is:

sample <- structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Air-Force", class = "factor"), Year = c(2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2011L, 2012L, 2013L
), Grouped_Position_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Skill", class = "factor"), Avg_Rating = c(0.7667, 
0, 0.7444, 0.7222, 0, 0.7556, 0.76224, 0.596322222222222, 0.706584615384615, 
0.767509090909091), n = c(1L, 1L, 3L, 6L, 1L, 1L, 5L, 9L, 13L, 
11L)), .Names = c("Team", "Year", "Grouped_Position_3", "Avg_Rating", 
"n"), row.names = c(NA, 10L), class = "data.frame")

Calculating a Rolling Average with Missing Time Values

1 Answers

Note