1
votes

I am using bioconductor package MLSeq on Ubuntu with R version 3.1.2 . I have tried running through the example provided by the package, and that work just fine. However, I want to use the bagsvm method for the classify function, so at chunk 14, I changed the code from

svm <- classify(data = data.trainS4, method = "svm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T") 

to

 bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

which produced the error:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa   
 Min.   : NA   Min.   : NA 
 1st Qu.: NA   1st Qu.: NA 
 Median : NA   Median : NA 
 Mean   :NaN   Mean   :NaN 
 3rd Qu.: NA   3rd Qu.: NA 
 Max.   : NA   Max.   : NA 
 NA's   :1     NA's   :1   
Error in train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit,  :
  Stopping
In addition: There were 17 warnings (use warnings() to see them)

The warnings were:

 Warning messages:
1: executing %dopar% sequentially: no parallel backend registered
2: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  :
  task 1 failed - "could not find function "lev""

warning 2 was then repeated 14 times followed by:

17: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

traceback() produced

4: stop("Stopping")
3: train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, 
       predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, 
       ...)
2: train(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, 
       predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, 
       ...)
1: classify(data = data.trainS4, method = "bagsvm", normalize = "deseq", 
       deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

I thought the problem might have been that the kernlab library, which I think MLSeq code uses, didn't get loaded so I tried

library(kernlab)
bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

which resulted in the same error, but the warnings changed to:

Warning messages:
    1: In eval(expr, envir, enclos) :
      model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  :
      task 1 failed - "no applicable method for 'predict' applied to an object of class "c('ksvm', 'vm')""

repeated 15 times followed by

16: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

I don't believe this problem is specific to MLSeq as I tried running the train function as

ctrl <- trainControl(method = "repeatedcv", number = 5, 
    repeats = 3)
train <- train(counts, conditions, method = "bag", B = 100, 
           bagControl = bagControl(fit = svmBag$fit, predict = svmBag$pred, 
                                   aggregate = svmBag$aggregate), trControl = ctrl)

where counts is a data frame with the RNASeq data and conditions is a factor with the classes and I got the exact same results. Any help is much appreciated.

2

2 Answers

1
votes

I confess I have not tried to reproduce all of your steps. However, all you're attempting to do is go from an "SVM", which works, to a "bagging ensemble of SVMs". I'm not sure if you know entirely what that means, but here it is in a nutshell:

Instead of just making 1 model using all of the (training) data, you are:

  • making several models
  • where each model is using a randomly chosen subset of the training data ("bagging")
  • and the quality of each of these models is validated by seeing how it performs on the unused portion of the training data.

Because that is the case, and because that is the only change you had made, I suspect:

  • you have either too little data, or too many entries that are empty or NA, such that any of these mini-SVM models within the bagging cannot be completed.

It looks like the mini-SVM models are broken into sets of 100 samples, by default. (See the B = 100 default option within classify.) If there is a likelihood that, for instance, one of these sub-models with only 100 observations would have an entirely blank / NA feature, then the bagging model will fail.


How to fix it?

  • First, I would try to bump up the B value to something significantly larger, like, say, 1000. For similar reasons, I would inspect the # of missing values in any of the features with something like table(is.na(feature_oi))

  • Next, if the model does work with any of the fixes above, I would see if you can fix the data itself somewhat, by either (a) seeing if the missing values could be recovered somehow, or (b) seeing if some of the observations with missing values are of such low quality that you might want to consider removing the observation entirely.

  • Of course, another solution if the model does work with those fixes is merely to use it with those fixes. Make B 1000 or something large. Keep in mind that if this is something you were trying to run in production, you are still building something rickety that can crash at times.

  • Finally, if the original fixes did not make the model work, then I am not sure of the problem. It could be that the implementation of bagsvm itself has a bug in it. Hopefully someone more familiar with the library can chime in with more advice on that front.

1
votes

I was trying to debug my problem, and seem to have inadvertently found a solution. Since the problem seemed to be in the predict function so I stored the svmBag$pred function as a variable predfunct so I could see where it was not working

predfunct<-function (object, x)
{
 if (is.character(lev(object))) {
    out <- predict(object, as.matrix(x), type = "probabilities")
    colnames(out) <- lev(object)
    rownames(out) <- NULL
  }
  else out <- predict(object, as.matrix(x))[, 1]
  out
}

and then calling

train <- train(counts, conditions, method = "bag", B = 100, 
       bagControl = bagControl(fit = svmBag$fit, predict = predfunct, 
                               aggregate = svmBag$aggregate), trControl = ctrl)

as in the last code block of the problem description with predfunct replacing svmBag$pred. Somehow this fixed the problem and everything runs just fine. If anyone can figure out why this worked, and preferably find a solution that isn't such a kluge, I will make your response the answer.