0
votes

I am trying to run the predict function for a LDA model. I have two predictors x1 and x2 and a categorical response y that takes values of -1 and 1. All parameters contains 500 datapoints. And I am splitting the dataset as follows:

xx = data.frame(cbind(x1,x2))
x = cbind(x1,x2)
x_train = x[1:350,]
x_test = x[351:N,]
y_train = y[1:350]
y_test = y[351:N]

Some output:

          x1        x2  y
1 -1.1843924  1.920765 -1
2  3.3167508  2.321631  1
3 -3.0301378  5.973256 -1
4 -1.3262624 -2.320463 -1
5 -0.6534166 -3.050822 -1
6 -2.0051728 -4.118190 -1

Then I fit the LDA model and try the predict function:

modelo.lda = lda(y_train~xx[1:350,1]+xx[1:350,2])
predict.lda = predict(modelo.lda, newdata=xx[351:N,])

Note: the xx values are stated in that way following this answer for the same problem.

But there is where I get:

Warning message: 'newdata' had 150 rows but variables found have 350 rows

I thought that mantaining the same xx[init:end,] form fixed the problem as the answer of this question stated but it seems it doesn't.

What could it be?

Thanks in advance.

1
Can you refer to the variable names directly in the formula in lda() plus use the "data" argument for the training dataset rather than using extract brackets to pass the variables? That would be a very common way to use formula-based model functions in R.aosmith

1 Answers

1
votes

As suggestion if you have train and test sets, it is better if you use them in this way so that you can avoid potential pitfalls. Try this:

library(MASS)
#Data
N <- 500
x1 <- rnorm(N,0,1)
x2 <- rnorm(N,1,5)
y <- round(runif(N,0,1),0)
xx = data.frame(x1,x2,y)
x_train = xx[1:350,]
x_test = xx[351:N,]
#Models
modelo.lda = lda(y_train~x1+x2,data = x_train)
predict.lda = predict(modelo.lda, newdata=x_test)

No warnings will we produced.