Invalid type for the dependent variable in lm() in R programming

Question

This is an R problem, not a statistics problem.

I am trying to perform multiple linear regression in R for a set of 20 independent variables and 1 dependent variable. The 20 independent variables are in one csv file and the 1 dependent variable is in another csv file. Each row in each file corresponds to one measurement a day.

I have managed to import the 20 independent variables using read.csv(...) into a (variable?) called "predictors". I then imported the dependent measurements, again using read.csv(...), into a (variable?) called "dependent".

However when I use lm(dependent~X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19+X20)

(Note: X_1,...,X20 are the headers of the columns for the predictors in that csv file)

I get the error:

Error in model.frame.default(formula = dependent ~ X1 + X2 + X3 + X4 + X5 + : invalid type (list) for variable 'dependent'

I cannot understand what is going wrong?

The predictors file looks something like (but up to X20)

and the dependent csv file looks like

Try to add the dependent variable as a column in the dataframe along with the independent variables. — Joswin K J
from the look of your error, I would say that dependent is a list. It would be much better if you had a dataframe with 21 columns : your 20 Xs and the dependent variable. Then, running a regression would be quite easy. You could look at cbind to append 2 dataframes — etienne
Show us the data structures you use in R (after import). Data from excel is nice, but doesn't tell the whole story. For information on how to present data, see stackoverflow.com/questions/5963269/… (hint: str()). — Roman Luštrik
@etienne I think that append the two dataframes sounds promising. However, would it affect the original csv files? I would prefer to not do so because of the risk of me making mistakes. Also the dependent file does not have a header, should I just fix this manually? — Trajan
@Jurassic: please post the content of dput(head(dependent,20)) — etienne

etienne etienne · Accepted Answer · 2016-02-05T14:52:02

Let's have some random data for df :

df<-replicate(5,rnorm(20))
names<-paste0('X',1:5)
colnames(df)<-names

dependent is already given in the comments, so we can use cbind to create one dataframe :

newDf<-cbind(dependent,df)

head(newDf)
#    dependent           X1         X2         X3           X4          X5
# 1 0.49295341 -1.728304515  0.9902622  0.6164557  0.904435464 -0.65801021
# 2 0.04331689  0.641830028  2.3829267  0.6165678  0.002691661  0.85520221
# 3 0.53106346 -1.529310531  0.6644159 -1.6921015 -1.176692158  1.15293623
# 4 0.06983530  0.001683688  0.2073812  0.3687421 -1.318220727  0.27627456
# 5 0.74574779  0.250247821 -2.2106331  0.9678592 -0.592997366  0.14410466
# 6 0.56349179  0.563867390  2.6917140  1.2765787  0.797380501 -0.07562508

We can then run the regression :

lm(dependent~.,newDf) # . selects all the other columns of newDf

# Call:
# lm(formula = dependent ~ ., data = newDf)

# Coefficients:
# (Intercept)           X1           X2           X3           X4           X5  
#     0.50522     -0.09975     -0.03040      0.06431     -0.00398     -0.09596

Invalid type for the dependent variable in lm() in R programming

1 Answers