1
votes

I am new to WEKA tool. Can i combine classification and clustering? i.e first cluster the data and then classify the instances cluster wise. for this requirement what are the steps are need to follow.

Thanks in advance.

1

1 Answers

2
votes

Yes you can. It is really easy with the ClassificationViaClustering classifier (Class ClassificationViaClustering).

Steps in Java pseudocode:
1. Create a SimpleKMeans clusterer

SimpleKMeans skm = new SimpleKMeans();
skm.setNumClusters(5); // in this example the clusterer uses 5 clusters

2. Read the dataset and set class index

BufferedReader reader = new BufferedReader(new FileReader("[path].arff")); // replace [path] with your path to dataset
Instances data = new Instances(reader);
data.setClassIndex([your class index]); // if the first attribute is your class, then insert 0  

3. Create the classifier

ClassifierViaClustering cvc = new ClassificationViaClustering();
cvc.setClusterer(skm); // let your classifier use the SimpleKMeans clusterer
cvc.buildClassifier(data);

Then, when you want to classify an new instance:

Instance instanceToClassify = new Instance(data.firstInstance());
instanceToClassify.setDataset(data); // the instance to be classified has to have access to the dataset
double class = cvc.classifyInstance(instanceToClassify); // classify instance based by the cluster it belongs to