8
votes

I am trying to do a regression with multiple dependent variables and multiple independent variables. Basically I have House Prices at a county level for the whole US, this is my IV. I then have several other variables at a county level (GDP, construction employment), these constitute my dependent variables. I would like to know if there is an efficient way to do all of these regressions at the same time. I am trying to get:

lm(IV1 ~ DV11 + DV21)
lm(IV2 ~ DV12 + DV22)

I would like to do this for each independent and each dependent variable.

EDIT: The OP added this information in response to my answer, now deleted, which misunderstood the question.

I don't think I explained this question very well, I apologize. Every dependent variable has 2 independent variables associated with it, that unique. So if I have 500 dependent variables, I have 500 unique independent variable 1, and 500 unique independent variable 2.

Ok, I will try once more, if I fail to explain myself again I may just give up (haha). I don't know what you mean by mtcars from R though [this is in reference to Metrics's answer], so let me try it this way. I'm going to have 3 vectors of data roughly 500 rows in each one. I'm trying to build a regression out of each row of data. Let's say vector 1 is my dependent variable (the one I'm trying to predict), and vectors 2 and 3 make up my independent variables. So the first regression would consist of the row 1 value for each vector, the 2nd would consist of the row 2 value for each one and so on. Thank you all again.

1
By "dependent variable", do you mean the number you want to predict, and "independent variable" is the number that you have that you want to use to do the predicting? Note that in R's formula syntax, the dependent variables do on the left hand side of the tilde & the IVs go on the RHS (lm(DV ~ IV)).gung - Reinstate Monica
PLS regression is one option.chl
I'm sorry, I did say that backwards. I switched up my IV and DV.I also flagged my question to have it moved to stack overflow, because I am mainly looking at how to implement this in R, as I understand the concept behind it. Thank you gung.user2355903
What is the reason to look for a way that is more efficient than the separate regressions? Yes, there is a loss of efficiency, but the solutions are so rapid anyway that it seems little is to be gained.whuber
Because I'm trying to do this for 500+ counties every quarter, if I have to run each one of those separately the project becomes non viable simply because of the time it would take. I was trying to see if I could basically import 1-2 large matrices of data, and automate the regression, but I'm not sure if that's possible.user2355903

1 Answers

2
votes

I am assuming you have dataframe as mydata.

mydata<-mtcars #mtcars is the data in R

dep<-c("mpg~","cyl~","disp~") # list of unique dependent variables with ~ 
indep1<-c("hp","drat","wt")  # list of first unique independent variables 
indep2<-c("qsec","vs","am") # list of second unique independent variables 
> myvar<-cbind(dep,indep1,indep2) # matrix of variables
> myvar
     dep     indep1 indep2
[1,] "mpg~"  "hp"   "qsec"
[2,] "cyl~"  "drat" "vs"  
[3,] "disp~" "wt"   "am" 



for (i in 1:dim(myvar)[1]){
print(paste("This is", i, "regression", "with dependent var",gsub("~","",myvar[i,1])))
k[[i]]<-lm(as.formula(paste(myvar[i,1],paste(myvar[i,2:3],collapse="+"))),mydata)
print(k[[i]]
}



 [1] "This is 1 regression with dependent var mpg"

Call:
lm(formula = as.formula(paste(myvar[i, 1], paste(myvar[i, 2:3], 
    collapse = "+"))), data = mydata)

Coefficients:
(Intercept)           hp         qsec  
   48.32371     -0.08459     -0.88658  

[1] "This is 2 regression with dependent var cyl"

Call:
lm(formula = as.formula(paste(myvar[i, 1], paste(myvar[i, 2:3], 
    collapse = "+"))), data = mydata)

Coefficients:
(Intercept)         drat           vs  
     12.265       -1.421       -2.209  

[1] "This is 3 regression with dependent var disp"

Call:
lm(formula = as.formula(paste(myvar[i, 1], paste(myvar[i, 2:3], 
    collapse = "+"))), data = mydata)

Coefficients:
(Intercept)           wt           am  
    -148.59       116.47        11.31  

Note: You can use the same process for the large number of variables.

Alternative approach:

Motivated by Hadley's answer here, I use function Map to solve above problem:

dep<-list("mpg~","cyl~","disp~") # list of unique dependent variables with ~ 
indep1<-list("hp","drat","wt")  # list of first unique independent variables 
indep2<-list("qsec","vs","am") # list of second unique independent variables
Map(function(x,y,z) lm(as.formula(paste(x,paste(list(y,z),collapse="+"))),data=mtcars),dep,indep1,indep2)
[[1]]

Call:
lm(formula = as.formula(paste(x, paste(list(y, z), collapse = "+"))), 
    data = mtcars)

Coefficients:
(Intercept)           hp         qsec  
   48.32371     -0.08459     -0.88658  


[[2]]

Call:
lm(formula = as.formula(paste(x, paste(list(y, z), collapse = "+"))), 
    data = mtcars)

Coefficients:
(Intercept)         drat           vs  
     12.265       -1.421       -2.209  


[[3]]

Call:
lm(formula = as.formula(paste(x, paste(list(y, z), collapse = "+"))), 
    data = mtcars)

Coefficients:
(Intercept)           wt           am  
    -148.59       116.47        11.31