1
votes

Say I want to run regressions per group whereby I want to use the last 5 year data as input for that regression. Then, for each next year, I would like to "shift" the input for that regression by one year (i.e., 4 observations).

From those regressions I want to extract both the R2 and the fitted values/residuals, which I then need in subsequent regressions that follow similar notions.

I have some code working using loops, but it is not really elegant nor efficient for large datasets. I assume there must be a nice plyr way for resolving this issue.

# libraries #
library(dplyr)
library(broom)

# reproducible data #    
df <- tibble(ID = as.factor(rep(c(1, 2), each = 40)),
             YEAR = rep(rep(c(2001:2010), each = 4), 2),
             QTR = rep(c(1:4), 20),
             DV = rnorm(80),
             IV = DV * rnorm(80))

# output vector #
output = tibble(ID = NA,
                YEAR = NA,
                R2 = NA)

# loop #
k = 1

for (i in levels(df$ID)){

  n_row = df %>% 
    arrange(ID) %>% 
    filter(ID == i) %>% 
    nrow()

  for (j in seq(1, (n_row - 19), by = 4)){

    output[k, 1] = i
    output[k, 2] = df %>% 
      filter(ID == i) %>%  
      slice((j + 19)) %>% 
      select(YEAR) %>% 
      unlist()

    output[k, 3] = df %>% 
      filter(ID == i) %>%  
      slice(j:(j + 19)) %>% 
      do(model = lm(DV ~ IV, data = .)) %>% 
      glance(model) %>% 
      ungroup() %>% 
      select(r.squared) %>% 
      ungroup()

    k = k + 1
  }
}

1

1 Answers

1
votes

Define a function which returns the year and R squared given a subset of rows of df (without ID) and then use rollapply with it.

library(dplyr)
library(zoo)

R2 <- function(x) {
  x <- as.data.frame(x)
  c(YEAR = tail(x$YEAR, 1), R2 = summary(lm(DV ~ IV, x))$r.squared)
}

df %>%
  group_by(ID) %>%
  do(data.frame(rollapply(.[-1], 20, by = 4, R2, by.column = FALSE))) %>%
  ungroup

giving:

# A tibble: 12 x 3
   ID     YEAR      R2
   <fct> <dbl>   <dbl>
 1 1      2005 0.0133 
 2 1      2006 0.130  
 3 1      2007 0.0476 
 4 1      2008 0.0116 
 5 1      2009 0.00337
 6 1      2010 0.00570
 7 2      2005 0.0481 
 8 2      2006 0.00527
 9 2      2007 0.0158 
10 2      2008 0.0303 
11 2      2009 0.235  
12 2      2010 0.116