I am trying to build a decision tree for a prediction model on the following dataset:
And here is my code:
fitTree = rpart(classLabel ~ from_station_id + start_day + start_time
+ gender + age, method = "class", data=d)
fancyRpartPlot(fitTree)
But the result decision tree has used only one of the attributes (from_station_id) as the 'splitting attribute' and did not care about the values of other attributes (start_day, start_time, gender, age). Here is the result:
Click to enlarge.
What am I doing wrong?
from_station_id == 131
the 'from_station_id' is much more predictive of the 'classLabel == 2' than any of the other variables. And from your output it appears that is also the case for other stationIDs. So what you are doing wrong is assuming that rpart will use all variables even if they are less predictive. - IRTFM