0
votes
    dat <- structure(list(yearRef = c(1970, 1971, 1972, 1973, 1974, 1975, 
    1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
    1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
    1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
    2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018), 
    value = c(0.761253538863966, 0.778365700864592, 0.748473422160476, 
    0.790408287413012, 0.726707786670043, 0.80587461240495, 0.81582881742434, 
    0.914998995290579, 0.903241004636529, 0.883446087736501, 
    0.878399385374308, 0.790239960507709, 0.853841173129717, 
    0.972923769177295, 0.899133969911117, 0.865840008976815, 
    0.85942147306247, 0.9471790327507, 0.905362802563981, 0.91644169495142, 
    0.985789564141214, 0.978212191208007, 0.885157529562834, 
    1.01638026873823, 1.02702020472382, 0.944421276774342, 0.979439113456467, 
    0.951183598644539, 1.12054063623421, 1.00767230122493, 1.02132151007705, 
    0.95649988168142, 0.928385199359045, 1.05071183719421, 1.11654102944792, 
    0.910601547182633, 0.936460862711605, 1.2398210426787, 0.979036947391532, 
    1.09931214756341, 1.12206830109171, 0.997384903912461, 1.07413151131128, 
    0.967026290186151, 1.04921352764649, 1.08746580600605, 1.02444885186573, 
    1.14604631626466, 1.06449109417896)), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -49L))

For each year, I want to calculate the mean of top 5 values from the previous 7 values. For e.g. the first mean value will be for 1977 and will consist of mean of best 5 years from 1970 till 1976. Similarly, for 1978, mean value will be the top 5 values from 1971-1977. Similarly, for 2018, the mean value will be top 5 values from 2011 - 2017

I have the following code from SO which sort of does the job.

  library(data.table)
  library(zoo)

  setDT(dat)

  dat[, mean.val:= if (.N > 6) 
        rollapplyr(value, 7,function(x) mean(tail(sort(x), 5)), fill = NA)  
        else mean(value)] 

though the first value in the new column mean.val is correct, it should be assigned to the row which has 1977 but has been assigned to 1976.

4
Is the issue only that the results should be shifted down 1 row? If so, you can fix that with the shift function, i.e. dat[, mean.val := shift(mean.val)]IceCreamToucan
Yeah I think the problem is that 1970 until 1976 consists of 7 values (rows). Therefore the value gets assigned at 1976. Shifting the data one row is the easiest solution.Koot6133

4 Answers

3
votes

You want to process the PRIOR 7 points rather than the 7 points that end at the current point. To do that use a width of list(-(1:7)). That says to use offsets -1 through -7 when processing the data. See ?rollapply for more information on specifying the width argument.

This (1) more directly specifies the intention making it easier to comprehend than approaches which require ignoring the required offsets and then fixing it up later and (2) uses only the packages you are already using (3) expresses the solution compactly and (4) preserves your solution changing only one argument.

  dat[, mean.val:= if (.N > 6) 
        rollapply(value, list(-(1:7)), function(x) mean(tail(sort(x), 5)), fill = NA)  
        else mean(value)] 
2
votes

If the only issue is that the values should be shifted down 1 row, you can use shift to fix this.

dat[, mean.val := shift(mean.val)]

FYI if you're on version >= 1.12.4 data.table you don't need zoo and can use data.table::frollapply.

dat[, mean.val2 := 
      shift(frollapply(value, 7, function(x) mean(tail(sort(x), 5))))]

dat[, all.equal(mean.val, mean.val2)] #TRUE
0
votes

This simple for loop solve the problem:

dat$mean.val = NA

for(i in 8:nrow(dat))
{
  dat$mean.val[i] = mean(sort(dat$value[(i-7):(i-1)],decreasing = TRUE)[1:5])
}
0
votes

I think you can use the excellent tsibble package for an amazing rolling function and then you can use the lead function to displace the results

library(tidyverse)

dat <- structure(list(yearRef = c(1970, 1971, 1972, 1973, 1974, 1975, 
                                  1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
                                  1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
                                  1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
                                  2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018), 
                      value = c(0.761253538863966, 0.778365700864592, 0.748473422160476, 
                                0.790408287413012, 0.726707786670043, 0.80587461240495, 0.81582881742434, 
                                0.914998995290579, 0.903241004636529, 0.883446087736501, 
                                0.878399385374308, 0.790239960507709, 0.853841173129717, 
                                0.972923769177295, 0.899133969911117, 0.865840008976815, 
                                0.85942147306247, 0.9471790327507, 0.905362802563981, 0.91644169495142, 
                                0.985789564141214, 0.978212191208007, 0.885157529562834, 
                                1.01638026873823, 1.02702020472382, 0.944421276774342, 0.979439113456467, 
                                0.951183598644539, 1.12054063623421, 1.00767230122493, 1.02132151007705, 
                                0.95649988168142, 0.928385199359045, 1.05071183719421, 1.11654102944792, 
                                0.910601547182633, 0.936460862711605, 1.2398210426787, 0.979036947391532, 
                                1.09931214756341, 1.12206830109171, 0.997384903912461, 1.07413151131128, 
                                0.967026290186151, 1.04921352764649, 1.08746580600605, 1.02444885186573, 
                                1.14604631626466, 1.06449109417896)), class = c("tbl_df", 
                                                                                "tbl", "data.frame"), row.names = c(NA, -49L))

complex_function <- . %>% 
  sort %>% 
  tail(.,5) %>% 
  mean

dat %>%
  mutate(roll_avg  = tsibble::slide_dbl(.x = value,.f = complex_function,.size = 7),
         roll_avg2 = lag(roll_avg))
#> # A tibble: 49 x 4
#>    yearRef value roll_avg roll_avg2
#>      <dbl> <dbl>    <dbl>     <dbl>
#>  1    1970 0.761   NA        NA    
#>  2    1971 0.778   NA        NA    
#>  3    1972 0.748   NA        NA    
#>  4    1973 0.790   NA        NA    
#>  5    1974 0.727   NA        NA    
#>  6    1975 0.806   NA        NA    
#>  7    1976 0.816    0.790    NA    
#>  8    1977 0.915    0.821     0.790
#>  9    1978 0.903    0.846     0.821
#> 10    1979 0.883    0.865     0.846
#> # … with 39 more rows

Created on 2020-01-14 by the reprex package (v0.3.0)