0
votes

I have 204 data with 6 atribute.

enter image description here

When i create the model with all data with this script, model = C5.0(dataset1[,-7], dataset1[,7]), the result give me no node like the picture below.

enter image description here

But, if i just use 100 data with this script, model = C5.0(dataset1[1:100,-7], dataset1[1:100,7]), the result give me a good decision tree like the picture below.

enter image description here

What is the problem? Is the problem in the data? Thank you.

1

1 Answers

0
votes

Examining the display of your trees, it is easy to see that what happened. The second model using only 100 points is NOT a better model that the first. When you gave C5.0 more data, it correctly determined that a simpler model was superior. Look at the results.

The first tree (with all 204 points) predicts everything is Lancar giving an error rate of 27% (55 errors out of 204).

What is the error rate for the second tree?

Node 2 predicts Lancar for 55 points with 25.5% errors (14 errors).
Node 4 predicts Lancar for 25 points with 28.0% errors ( 7 errors).
Node 6 predicts Macet for 8 points with 50.0% errors ( 4 errors).
Node 7 predicts Macet for 12 points with 41.7% errors ( 5 errors).
Total errors 30 out of 100 or 30.0% - worse than the 27% error rate for the simpler model. C5.0 simply determined that the best model available was to predict that all points are in the majority class (Lancar).