Weka - cross validation based on nominal values

Question

I have data that i want to test classifiers on. This data has a lot of attributes and the target class that is binary true/false. Additionally i know that data comes from 32 sources called 1-32. This information is in the arff file present.

So i have an arff file:

@attribute <MANY ATTRIBUTES>
@attribute <MANY ATTRIBUTES>
@attribute class {True,False}
@attribute source {1,2,3,4,5,6,7,8,9,...,30,31,32}

In the weka explorer under classifier i can choose cross-validation 4 fold for example. But then weka will order that instances randomly into 4 bins. what i want is that weka will use 24 sources as train and 8 as test. So each source is completely either in test or train but not in both.

Is that somehow possible with the on board methods?

My answer to this question may be helpful: stackoverflow.com/questions/47683638/… — zbicyclist
As ist does only support preserving order it would only help (perhaps) if there are exactly the same amount of instances per (in my case) "source". Thats not the case. — rom

zbicyclist zbicyclist · Accepted Answer · 2017-12-14T02:12:41

If you don't want to use 4 fold cross-validation, but only want to use 24 as train and 8 as test, you can split the file into 2 (24 in one, 8 in the other). Load the 24 file into Preprocess. In the Classifier tab, instead of cross-validation click the radio button next to Supplied Test Set and then give it the file name of the 8 file.

Weka - cross validation based on nominal values

1 Answers