0
votes

I have a data frame with about 200 columns and it looks like this:

d1 <- structure(list(Date=c(2012, 2012, 2013, 2013, 2014, 2014),
                x1=c(NA, NA, 17L, 29L, 27L, 10L), x2=c(30L, 19L, 22L, 20L, 11L,
                24L), x3=c(NA, 23L, 22L, 27L, 21L, 26L), x4=c(30L, 28L, 23L,
                24L, 10L, 17L), x5=c(12L, 18L, 17L, 16L, 30L, 26L)),
                 row.names=c(NA, 6L), class="data.frame")

Output:

 Date x1 x2 x3 x4 x5
1 2012 NA 30 NA 30 12
2 2012 NA 19 23 28 18
3 2013 17 22 22 23 17
4 2013 29 20 27 24 16
5 2014 27 11 21 10 30
6 2014 10 24 26 17 26

I now want to run linear regressions for each year separately and a create a new data frame only with the intercepts for each variable x1 to x4 for each year. My independent variable is x5.

like this:

 Time x1 x2 x3 x4 
1 2012 Interceptx1 Interceptx2  Interceptx3 Interceptx4 
2 2013 Interceptx1 Interceptx2  Interceptx3 Interceptx4 
3 2014 Interceptx1 Interceptx2  Interceptx3 Interceptx4 

I tried lms <- lapply(2:5, function(x) lm(d1[,x] ~ d1$x5)) and df <- data.frame(sapply(lms, coef)) but this runs the regression over the whole time period. My data frame contains 200 columns and i'm therefore looking for a efficient way to create this new data frame.

Thank you very much!

1
Can you define what you mean by intercept? Do you mean from the output of lm()? Also what combinations of variables do you want to train the model on? - Rohit
If you're running linear regressions by year, and you want your dependent variable to be the xn columns, what will the independent variable be? e.g. lm(x1 ~ ?, data = d1) - r.bot
I edited my post. @Rohit With intercept I mean the intercept which I get from df <- data.frame(sapply(lms, coef)). - Pogi93
@r.bot My independent variable is x5. So the regressions by year in my example would be: lm(x1 ~ x5, data = d1) , lm(x2 ~ x5, data = d1) , lm(x3 ~ x5, data = d1) , lm(x4 ~ x5, data = d1) - Pogi93
There is a package called broom for doing this. - Seth

1 Answers

0
votes

Here's a solution based on some other work I've done. I'm sure it's possible to clean it up into a purely purrr solution and would welcome any suggestions along those lines.

I've had to make some changes to your data as the NA values were causing it to break.

library(purrr)
library(dplyr)
library(tidyr)
library(broom)

d1 <- structure(list(cyear=c(2012, 2012, 2013, 2013, 2014, 2014),
                     x1=c(5L, 5L, 17L, 29L, 27L, 10L), 
                     x2=c(30L, 19L, 22L, 20L, 11L,24L), 
                     x3=c(5L, 23L, 22L, 27L, 21L, 26L), 
                     x4=c(30L, 28L, 23L,24L, 10L, 17L), 
                     x5=c(12L, 18L, 17L, 16L, 30L, 26L)),
                row.names=c(NA, 6L), class="data.frame")

models <- nest(d1, -cyear)
str(models)

reg_vars <- c("x1", "x2", "x3", "x4")

# The following loops through each of the independent
for(i in 1:length(reg_vars)){
  var_mdl <- rlang::sym(paste0(reg_vars[i], "_mdl")) # create the name of a model
  var_res <- rlang::sym(paste0(reg_vars[i], "_res")) # create the name of the results
  formula = as.formula(paste0(reg_vars[i], " ~ x5")) # create the regression formula
  print(formula)

  models <- models %>%
    mutate(
# create the model as an element in the nested data
      !!var_mdl := map(data, ~ lm(formula, data = ., na.action = "na.omit")), 
# tidy the model results into an element
      !!var_res := map(!!var_mdl, tidy)
    )
}
models

reg_vars2 <- paste0(reg_vars, "_res")
reg_vars2

# clean up ####
# this will extract the regression results into a new data frame
for(i in 1:length(reg_vars2)){
  if(i == 1){
    results <- rlang::sym(reg_vars2[i])
    out_df <- models %>% 
      select(cyear, !!results) %>% 
      unnest(!!results)  
  }
  results <- rlang::sym(reg_vars2[i])
  temp_df <- models %>% 
    select(cyear, !!results) %>% 
    unnest(!!results)
  out_df <- bind_rows(out_df, temp_df)
}

head(out_df)