I'm performing some experiments with logistic regression in R with the Auto
dataset included in R.
I've get the training part (80%) and the test part (20%) normalizing each part individually.
I can create the model without any problem with the line:
mlr<-glm(mpg ~
displacement + horsepower + weight, data =train)
I can even predict train$mpg
with the train set:
trainpred<-predict(mlr,train,type="response")
And with this calculate the sample error:
etab <- table(trainpred, train[,1])
insampleerror<-sum(diag(etab))/sum(etab)
The problem comes when I want predict with the test set. I use the following line:
testpred<-predict(model_rl,test,type="response")
Which gives me this warning:
'newdata' had 79 rows but variables found have 313 rows
but it doesn't work, because testpred
have the same length of trainpred
(should be less). When I want calculate the error in test using testpred
with the following line:
etabtest <- table(testpred, test[,1])
I get the following error:
Error en table(testpred, test[, 1]) :
all arguments must have the same length
What I'm doing wrong?
mlr<-glm(mpg ~ displacement + horsepower + weight, data =train)
. You don't need thetrain$
if you have specified the data argument. More importantly, you might check that this creates a logistic regression. I think it is actually OLS. You have to set the link and family arguments. There are many examples on SO. – lmo