Error in some group is too small for 'qda'

Question

# load the library and data
library('MASS')
library('sqldf')
data(fgl, package = 'MASS')
df <- data.frame(fgl)
# (a)select chosen glass types
#adf <- sqldf("select * from df where type='WinF' or type='WinNF' or     type='Veh' or type='Head'")
adf <- subset(df, type=="WinF"|type=="WinNF"|type=="Veh"|type=="Head")

traindata <- adf[1:128,]
testdata <- adf[129:192,]
#typetesting <- adf$type[129:192,]
# LDA
# fit the qad model based on the training 
qdamodel = qda(type~RI+Na+Mg+Al+Si+K+Ca+Ba+Fe, data=traindata)

I have an error

Error in qda.default(x, grouping, ...) : 
  some group is too small for 'qda'

I use both the sqldf and subset function, but they don't work. Thanks.

Hong Ooi Hong Ooi · Accepted Answer · 2017-03-06T00:19:41

The variable type is a factor with 6 levels: "WinF", "Veh", Head", "WinNF", "Con" and "Tabl". When you do this:

adf <- subset(df, type=="WinF"|type=="WinNF"|type=="Veh"|type=="Head")

You keep the rows for 4 of those levels, but the variable itself still has 6 levels. So the 2 remaining levels aren't represented in your sample, which is what qda complains about.

You can get around this by converting type back into a character variable:

adf$type <- as.character(adf$type)

and then doing the rest of the analysis.

Error in some group is too small for 'qda'

1 Answers