4
votes

I'm getting the following error and I don't know what may have gone wrong. I'm using R Studio with the 3.1.3 version of R for Windows 8.1 and using the Caret package for datamining.

I have the following training data:

str(training)

'data.frame':   212300 obs. of  21 variables:

 $ FL_DATE_MDD_MMDD     : int  101 101 101 101 101 101 101 101 101 101 ...

 $ FL_DATE              : int  1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 ...

 $ UNIQUE_CARRIER       : Factor w/ 13 levels "9E","AA","AS",..: 11 10 2 5 8 9 11 10 10 10 ...

 $ DEST                 : Factor w/ 150 levels "ABE","ABQ","ALB",..: 111 70 82 8 8 31 110 44 53 80 ...

 $ DEST_CITY_NAME       : Factor w/ 148 levels "Akron, OH","Albany, NY",..: 107 61 96 9 9 29 106 36 97 78 ...

 $ ROUNDED_TIME         : int  451 451 551 551 551 551 551 551 551 551 ...

 $ CRS_DEP_TIME         : int  500 520 600 600 600 600 600 600 602 607 ...

 $ DEP_DEL15            : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 2 1 1 ...

 $ CRS_ARR_TIME         : int  746 813 905 903 855 815 901 744 901 841 ...

 $ Conditions           : Factor w/ 28 levels "Blowing Snow",..: 2 2 2 2 2 2 2 2 2 2 ...

 $ Dew.PointC           : num  -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 ...

 $ Events               : Factor w/ 10 levels "","Fog","Fog-Rain",..: 1 1 1 1 1 1 1 1 1 1 ...

 $ Gust.SpeedKm.h       : num  NA NA NA NA NA NA NA NA NA NA ...

 $ Humidity             : int  68 68 71 71 71 71 71 71 71 71 ...

 $ Precipitationmm      : num  NA NA NA NA NA NA NA NA NA NA ...

 $ Sea.Level.PressurehPa: num  1021 1021 1022 1022 1022 ...

 $ TemperatureC         : num  -9.4 -9.4 -10 -10 -10 -10 -10 -10 -10 -10 ...

 $ VisibilityKm         : num  16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 ...

 $ Wind.Direction       : Factor w/ 18 levels "Calm","East",..: 9 9 7 7 7 7 7 7 7 7 ...

 $ WindDirDegrees       : int  320 320 330 330 330 330 330 330 330 330 ...

 $ Wind.SpeedKm.h       : num  20.4 20.4 13 13 13 13 13 13 13 13 ...

 - attr(*, "na.action")=Class 'omit'  Named int [1:22539] 3 32 45 87 94 325 472 548 949 1333 ...

  .. ..- attr(*, "names")= chr [1:22539] "3" "32" "45" "87" ...

and when I execute the following command:

ldaModel <- train(DEP_DEL15~.,data=training,method="lda",preProc=c("center","scale"),na.remove=TRUE)

I get:

Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error in train.default(x, y, weights = w, ...) : Stopping

2

2 Answers

7
votes

It is probably due to having about outcome factor with levels "0" and "1".

There is a specific warning issued when this happens: At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1"

It seems that people uniformly ignore warnings so I'm going to make this throw an error in the next version.

0
votes

If the variables Gust.SpeedKm.h and Precipitationmm contain only NA's try omitting those variables from your data before running the model. If they contain partial NA's and you think they could have predictive value as features then use imputation. Follow this documentation for pre-processing in caret, including imputation.