When using multiple classifiers - How to measure the ensemble's performance? [SciKit Learn]

Question

I have a classification problem (predicting whether a sequence belongs to a class or not), for which I decided to use multiple classification methods, in order to help filter out the false positives.

(The problem is in bioinformatics - classifying protein sequences as being Neuropeptide precursors sequences. Here's the original article if anyone's interested, and the code used to generate features and to train a single predictor) .

Now, the classifiers have roughly similar performance metrics (83-94% accuracy/precision/etc' on the training set for 10-fold CV), so my 'naive' approach was to simply use multiple classifiers (Random Forests, ExtraTrees, SVM (Linear kernel), SVM (RBF kernel) and GRB) , and to use a simple majority vote.

MY question is: How can I get the performance metrics for the different classifiers and/or their votes predictions? That is, I want to see if using the multiple classifiers improves my performance at all, or which combination of them does.

My intuition is maybe to use the ROC score, but I don't know how to "combine" the results and to get it from a combination of classifiers. (That is, to see what the ROC curve is just for each classifier alone [already known], then to see the ROC curve or AUC for the training data using combinations of classifiers).

(I currently filter the predictions using "predict probabilities" with the Random Forests and ExtraTrees methods, then I filter arbitrarily for results with a predicted score below '0.85'. An additional layer of filtering is "how many classifiers agree on this protein's positive classification").

Thank you very much!!

(The website implementation, where we're using the multiple classifiers - http://neuropid.cs.huji.ac.il/ )

The whole shebang is implemented using SciKit learn and python. Citations and all!)

that's somehow an off topic question, but did you find a ready set functions and classes for multiple classifier systems in skilearn or did you code it manually ? ( specially for something like fusion by learning ) — Hady Elsahar
I coded it manually; surprisingly, there's nothing built in for classifier stacking/fusion (beyond already existing models such as AdaBoost, Forest ensembles, etc' ) . — GrimSqueaker
Stacking or viting isn't hard to do naively though, e.g.; stackoverflow.com/questions/21506128/… — GrimSqueaker

Simon Simon · Accepted Answer · 2014-03-18T17:01:44

To evaluate the performance of the ensemble, simply follow the same approach as you would normally. However, you will want to get the 10 fold data set partitions first, and for each fold, train all of your ensemble on that same fold, measure the accuracy, rinse and repeat with the other folds and then compute the accuracy of the ensemble. So the key difference is to not train the individual algorithms using k fold cross-validation when evaluating the ensemble. The important thing is not to let the ensemble see the test data either directly or by letting one of it's algorithms see the test data.

Note also that RF and Extra Trees are already ensemble algorithms in their own right.

An alternative approach (again making sure the ensemble approach) is to take the probabilities and \ or labels output by your classifiers, and feed them into another classifier (say a DT, RF, SVM, or whatever) that produces a prediction by combining the best guesses from these other classifiers. This is termed "Stacking"

When using multiple classifiers - How to measure the ensemble's performance? [SciKit Learn]

2 Answers