I have a dataframe, which I want to use for a regression analysis. The data looks like this:
COMPANY <- c(rep("COMPANY_1",24),rep("COMPANY_2",24),rep("COMPANY_3",24))
YEAR <- rep(rep(2014:2019, each = 4),3)
SEASON <- rep(c("SPRING","SUMMER","AUTUMN","WINTER"),18)
X <- sample(100:1000,72)
Y <- sample(10:100,72)
df_ALL <- data.frame(COMPANY, YEAR, SEASON, X, Y)
However, I do not want to base my analysis solely on this data set, but also on a variety of subsets (for example: Only Company 1, only for Winter, only for Company 1 in Winter etc.). I succeeded in doing this, by creating a nested list and than performing regressions (plm) on each dataframe in the nested list by using lapply. The regression results are than stored in a second nested list from where I can easily access them.
However, the process in which I created the nested lists seems to be very unprofessional and error-prone to me. Here is the code which I used for creating the nested list:
nested_list <- vector(mode="list", length=2)
nested_list <- setNames(nested_list, c("2014-2016","2017-2019"))
for (i in 1:2) {
nested_list[[i]] <- vector(mode = "list", length = 5)
nested_list[[i]] <- setNames(nested_list[[i]],c("ALL_SEASONS","SPRING","SUMMER","AUTUMN","WINTER"))
for (j in 1:5) {
nested_list[[i]][[j]] <- vector(mode="list",length=4)
nested_list[[i]][[j]] <- setNames(nested_list[[i]][[j]],c("ALL_COMPANIES","COMPANY_1","COMPANY_2","COMPANY_3"))
}
}
nested_list[["2014-2016"]][["ALL_SEASONS"]][["ALL"]] <- subset(df_ALL, YEAR >= 2014 & YEAR <= 2016)
nested_list[["2017-2019"]][["ALL_SEASONS"]][["ALL"]] <- subset(df_ALL, YEAR >= 2017 & YEAR <= 2019)
for (i in 1:2) {
nested_list[[i]][["ALL_SEASONS"]][["COMPANY_1"]] <- subset(nested_list[[i]][["ALL_SEASONS"]][["ALL_COMPANIES"]],COMPANY == "COMPANY_1")
nested_list[[i]][["ALL_SEASONS"]][["COMPANY_2"]] <- subset(nested_list[[i]][["ALL_SEASONS"]][["ALL_COMPANIES"]],COMPANY == "COMPANY_2")
nested_list[[i]][["ALL_SEASONS"]][["COMPANY_3"]] <- subset(nested_list[[i]][["ALL_SEASONS"]][["ALL_COMPANIES"]],COMPANY == "COMPANY_3")
for (k in 1:3) {
nested_list[[i]][["SPRING"]][[k]] <- subset(nested_list[[i]][["ALL_SEASONS"]][[k]], SEASON == "SPRING")
nested_list[[i]][["SUMMER"]][[k]] <- subset(nested_list[[i]][["ALL_SEASONS"]][[k]], SEASON == "SUMMER")
nested_list[[i]][["AUTUMN"]][[k]] <- subset(nested_list[[i]][["ALL_SEASONS"]][[k]], SEASON == "AUTUMN")
nested_list[[i]][["WINTER"]][[k]] <- subset(nested_list[[i]][["ALL_SEASONS"]][[k]], SEASON == "WINTER")
}
}
Is there are more elegant way, to create a nested list? Or is the whole process of creating a nested list and than using lapply not very recommendable? What would be the alternative than?