1
votes

I am trying to run 5-fold cross-validation on WEKA using a FilteredClassifier with SMOTE.

To my knowledge, I should apply SMOTE in each of the CV folds to obtain my CV error.

Does anyone have documentation or background on how WEKA performs CV in a FilteredClassifier using

Evaluation().crossvalidate_model(INPUTS)

I am using python with the weka-wrapper.

Thank you!

1

1 Answers

0
votes

Weka treats the FilteredClassifier meta-classifier just like any other classifier (since they both implement the weka.classifiers.Classifier interface).

If you're performing 5-fold CV, then the data gets split into 5 pairs of train/test folds and each time the classifier gets trained with the training fold and then evaluated on the test fold. The weka.classifiers.Evaluation class records the statistics obtained from the test data of each of the folds.

In your case (for each train/test fold), the FilteredClassifier uses the training data to initialize the SMOTE filter and filter it before building the base-classifier with it.

So the answer is yes, your SMOTE filter gets initialized and applied in each of the CV folds.

The official place for Weka questions is the Weka mailing list.