we are modelling a decision tree using both continous and binary inputs. We are analyzing weather effects on biking behavior. A linear regression suggests that "rain" has a huge impact on bike counts. Our rain variable is binary showing hourly status of rain.
Using rpart to create a decision tree does not include "rain" as a node, although we expect it to be very decisive on the number of bikes. This might be due to the classification of the rain variable. Rpart seems to prefer to use continous variables (like temperature) for decision nodes.
Is there anything we should know about how rpart determines whether to use continous or binary variables as decision node? Is it possible to control this selection of variables?
library("rpart")
fit <- rpart(bikecount ~ df.weather$temp+df.weather$weekday+df.weather$rain, data=training.data, method="class")