2
votes

I'm trying to evaluate a series of one-variable regression models by using an R-script. My data is formatted in a .csv file where the first 8 columns represent dependent variables that we would like to predict and the next 52 columns represent independent variable that might be used to fit any one of the 8 dependent variables.

I've read the data into the script successfully. I've also created a list of headers for the dependent and independent variables in a vector. So my script looks like this:

#... do some stuff to get data above

var_dep<-c("dep1","dep2",...)
var_indep<-c("indep1","indep2",...)

for(dep in var_dep){
for(indep in var_indep){
   lm1<-lm(dep~indep, data=mydat)
}
}

I get this error message when I run

Rscript R_ScriptV2.R XLK_friendly.csv 

in terminal

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :

contrasts can be applied only to factors with 2 or more levels

Calls: lm ... model.matrix -> model.matrix.default -> contrasts<-

In addition: Warning message:

In model.response(mf, "numeric") : NAs introduced by coercion

Execution halted

So how can I specify the dependent and indepedent variables in my regression using variables?

1

1 Answers

4
votes

This might be a hacky solution, but you can use as.formula in conjunction with paste to get this to work:

for (dep in var_dep){
    for (indep in var_indep){
        f <- as.formula(paste0(dep, " ~ ", indep))
        lm1 <- lm(f, data = mydata)
    }
}