I am using 10 folds cross validations technique to train 200K records. The target class index is like
Status {PASS,FAIL}
Pass has ~144K and Fail has ~6K instances.
while training the model using J48. Its not able to find the failures. The accuracy is 95% but most the cases its predicting just success. where as in our case, we need to find the failure which are actually happening.
So my question is mainly hypothetical analysis.
Does it really matter the distribution among class instances during training(in my case PASS,FAIL).
What could be possible values in weka J48 tree to train better as i see 2% failure in every 1000 records i pass. So, there will be increase in success if we increase the Success scenarios.
What should be the ratio among them in order to better train them.
There is nothing i could find in the API as far as ratio is concerned.
I am not adding the code because this is happening both with Java API as well as using weka GUI tool.
Many Thanks.