'newdata' had 10000 rows but variables found have 40000 rows

Question

I got the error:

'newdata' had 10000 rows but variables found have 40000 rows

train_frame=data$trainData[,-c(65,81)]
for (i in 1:98){
names(train_frame)[i]<-i
}

lda(data$trainLabel~ train_frame,prior=rep(1,10)/10,method='moment')->lda_equal_prior    prediction_frame=data.frame(data$testData[,-c(65,81)])
for (i in 1:98){
names(prediction_frame)[i]<-paste('train_frame',i,sep='')
}
predict(lda_equal_prior,data.frame(prediction_frame))->prediction_lda_equal_prior

This probably won't help with error, but the for loops are not needed; try names(train_frame) <- 1:98 or names(train_frame) <- seq_along(train_frame) — manotheshark
I have tried but it does work.names(train_frame) <- seq_along(train_frame) and names(prediction_frame) <- seq_along(prediction_frame) — Winnie Liu
train_frame=data$trainData[,-c(65,81)]. Why are you indexing the vector data$trainData by columns?? what is the structure of data? Is it a list or a dataframe? — acylam

acylam acylam · Accepted Answer · 2017-04-30T20:47:47

First thing's first, it is always good practice to not start your variable names with a number. Second, I think it's the way you wrote your lda function that's causing the warning (I believe it gives a warning message, not an error). The variable names in your lda_equal_prior refers to the data object(which I am still not sure whether it is a list or a dataframe). Nonetheless, prediction_frame does not have the same variable names because what you really want are the variable names from train_frame. So predict actually reuses the names from data, hence the warning. Assuming data is a list, try:

# Create training frame
train_frame = data.frame(data$trainLabel, data$trainData[,-c(65,81)])
names(train_frame)[-1] = paste0("V", 1:98)

# Run LDA with trainLabel on the rest
lda_equal_prior = lda(trainLabel ~ ., data = train_frame,
prior = rep(1,10)/10, method='moment')

# Create prediction frame
prediction_frame = data.frame(data$testData[,-c(65,81)])
names(prediction_frame) = paste0("V", 1:98)

# Predict using newdata
prediction_lda_equal_prior = predict(lda_equal_prior, newdata = prediction_frame)

Here I combined trainLabel and trainData, so lda refers to the correct dataframe with correct variable names.

Again, a reproducible example would be nice.

'newdata' had 10000 rows but variables found have 40000 rows

1 Answers