0
votes

This is an R problem, not a statistics problem.

I am trying to perform multiple linear regression in R for a set of 20 independent variables and 1 dependent variable. The 20 independent variables are in one csv file and the 1 dependent variable is in another csv file. Each row in each file corresponds to one measurement a day.

I have managed to import the 20 independent variables using read.csv(...) into a (variable?) called "predictors". I then imported the dependent measurements, again using read.csv(...), into a (variable?) called "dependent".

However when I use lm(dependent~X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19+X20)

(Note: X_1,...,X20 are the headers of the columns for the predictors in that csv file)

I get the error:

Error in model.frame.default(formula = dependent ~ X1 + X2 + X3 + X4 + X5 + : invalid type (list) for variable 'dependent'

I cannot understand what is going wrong?

The predictors file looks something like (but up to X20)

enter image description here

and the dependent csv file looks like

enter image description here

1
Try to add the dependent variable as a column in the dataframe along with the independent variables. - Joswin K J
from the look of your error, I would say that dependent is a list. It would be much better if you had a dataframe with 21 columns : your 20 Xs and the dependent variable. Then, running a regression would be quite easy. You could look at cbind to append 2 dataframes - etienne
Show us the data structures you use in R (after import). Data from excel is nice, but doesn't tell the whole story. For information on how to present data, see stackoverflow.com/questions/5963269/… (hint: str()). - Roman Luštrik
@etienne I think that append the two dataframes sounds promising. However, would it affect the original csv files? I would prefer to not do so because of the risk of me making mistakes. Also the dependent file does not have a header, should I just fix this manually? - Trajan
@Jurassic: please post the content of dput(head(dependent,20)) - etienne

1 Answers

1
votes

Let's have some random data for df :

df<-replicate(5,rnorm(20))
names<-paste0('X',1:5)
colnames(df)<-names

dependent is already given in the comments, so we can use cbind to create one dataframe :

newDf<-cbind(dependent,df)

head(newDf)
#    dependent           X1         X2         X3           X4          X5
# 1 0.49295341 -1.728304515  0.9902622  0.6164557  0.904435464 -0.65801021
# 2 0.04331689  0.641830028  2.3829267  0.6165678  0.002691661  0.85520221
# 3 0.53106346 -1.529310531  0.6644159 -1.6921015 -1.176692158  1.15293623
# 4 0.06983530  0.001683688  0.2073812  0.3687421 -1.318220727  0.27627456
# 5 0.74574779  0.250247821 -2.2106331  0.9678592 -0.592997366  0.14410466
# 6 0.56349179  0.563867390  2.6917140  1.2765787  0.797380501 -0.07562508

We can then run the regression :

lm(dependent~.,newDf) # . selects all the other columns of newDf

# Call:
# lm(formula = dependent ~ ., data = newDf)

# Coefficients:
# (Intercept)           X1           X2           X3           X4           X5  
#     0.50522     -0.09975     -0.03040      0.06431     -0.00398     -0.09596