I am having trouble classifying new cases in linear Discriminant Function Analysis (DFA) - specifically calculating the discriminant function values for each new test case from the raw variables, so I can then plot and overlay with the points from the training data.
For my training dataframe ref
I have performed a linear DFA on a dataset with 895 rows, and 14 measurement variables using z <-lda(ref$species ~ ref$v1 + ref$v2 + ref$v3 ...etc... + ref$v14)
I get seven LD functions. Then I use predict
to obtain the discriminant function scores for every one of the 895 individuals on the discriminant function axes by z1 <-predict(z)$x
Now I want to classify thousands of new cases using the first two discriminant functions (lets just say it has three rows in a short example file). I call a new file test
into R, which has vectors (raw measurements) with the same names as those in ref
Then I call z2 <- predict(z, test)
but get a warning message:
"Warning message: 'newdata' had 3 rows but variables found have 895 rows"
How may I simply append new vectors with these scores to my test dataframe test
? Or at least produce a matrix of these scores.
I have not managed to sort myself out despite reading and trying things from these two good sites: here and here. For the second link scroll to "Loadings for the Discriminant Functions".
Surely it is something simple I am missing...perhaps the ref
and test
dataframes not matching somehow...