Have huge distributed datasets which are trained to produce classifiers.All the datasets have identical attributes and the training is done using a single algorithm J48. The problem I am facing is as to how would combine these classifiers to have a single classifier which can be used for testing and predicting data. I am using weka tool for the code.Have converted the weka jar to dll.Using C# language. Any help in C# or Java would be of great help. If any additional information is needed you are free to ask. Thanks
2 Answers
It is perfectly possible to do what you are asking for. You could build N different classifiers from N different but compatible datasets and combine their outputs to form a new dataset of higher order. Its a hierarchical way of combining classifiers and there is a great variety in ways of doing that. Its called 'ensembling' or 'classifier ensemble'. There are a large number of technical articles detailing how to do it.
One approach would be: 1. Train/get N different classifiers. 2. Build a new dataset with its probability output for a known set of instances, one instance per row, the set-of-output-probalities per set of columns. And the right/known class. 3. Throw away the old attributes and retain only the output probs calculated and known class. 4. Train a new model/classifier with this higher order dataset (don't need to use the whole data, only a moderate subsample). 5. For every new instance, get lower level probabilities (using N classifiers), as previously done, and apply higher level classifier over these newly constructed instance.
Hope to have helped.
I don't think it is possible if you create N classifiers on N training sets and then combine N classifiers to generate a single one. Because first, the data are different; second, so the models will be different. Instead, what I would do is if I were happy with the N results, I would combine all N datasets and develop a single model from it to test and predict unseen data.