0
votes

Help I would like to request: I am trying to automate a sequence of tasks in a loop. Basically it is running a series of regressions, storing the results and exporting (writing) it into a csv file.

Issue: I have tried many times to run a “For” loop. I couldn’t figure out how to create and name new variables for each regression based on the current_date column

Description of data: Link to dataset It is a dataset of US states in a tabular format (somewhat similar to Tidy Data). Each Row or observation refers to a specific state at a specific date. Each column refers to an attribute of the state Task: Main problem: Automate the process of running a series of regressions, storing the results by date and exporting (writing) it into a csv file.

Sequence of tasks:

  1. Run the regression
    1. Use broom to tidy the regression results (coefficients, std errors and p-vale)
    2. Store the results by a specific date
    3. Repeat regression for another date
    4. Use broom to tidy the regression results (coefficients and the model)
    5. Store the results of the co-efficients (coeff, std error, p value) by a specific date
    6. Store the results of the regression models (R-square etc.) by a specific date

….. Finally, export all the results in a CSV file

library(dplyr)
library(broom)
for(i in 1:length(current_date)){
    lmtest <- lm( outcome_var [[i]] ~ cumulative_var1_at_current_date + key_cat_var1, data = testdata)
    coeff_results<-tidy(lmtest)
    model_results<- glance(lmtest) 
    write.csv(coeff_results, “C:\\Users\\....combined_results.csv”)
    }
1

1 Answers

1
votes

Using a loop in this context does not make sense to me. Also, the date doesn't seem to affect the output of the regression. Do you want something like that:

library(dplyr)
library(broom)
current_date <- Sys.Date()
lmtest <- lm( outcome_var ~ cumulative_var1_at_current_date + key_cat_var1, data = testdata)
coeff_results <- tidy(lmtest)
model_results <- glance(lmtest) 
write.csv(coeff_results, paste0("coeff-results-", current_date, ".csv"))
write.csv(model_results, paste0("model-results", current_date, ".csv"))

If not please give an example of what is the current_date, and how your data are affected by it.

After Clarification on the purpose of the current date:

So basically, you need to loop through the unique dates in the current_date column. Then, you have to keep the data that correspond to each day. Here is a solution.

testdata$current_date <- as.Date(testdata$current_date, format = "%d/%m/%y")
all_dates <- unique(testdata$current_date)

for(i in 1:length(all_dates)) {
  lmtest <- filter(testdata, current_date %in% all_dates[i]) %>% 
    lm( outcome_var ~ cumulative_var1_at_current_date + key_cat_var1, data = .)
  coeff_results <- tidy(lmtest)
  model_results <- glance(lmtest)
  write.csv(coeff_results, paste0("coeff-results-", all_dates[i], ".csv"))
  write.csv(model_results, paste0("model-results", all_dates[i], ".csv"))
}

Probably there is a more elegant way through subset in lm, but I personally prefer dplyr::filter because it gives you more control.