I am using latitude and longitude as well as Landsat data as predictors in my random forest model that aims to predict presence or absence of black spruce trees across a landscape. Latitude shows up as having high importance and you could see the impact of the sharp latitude lines in the mapped predictions. Also, using tunerF the mtry is optimized for only 2 predictors, latitude being one of them. Is my model underfitting?
1 Answers
1
votes
choosing mtry=2, does not mean you're completely dropping a third parameter. In fact, the opposite. You should as a start include all reasonable predictors in the training the RF model. Lowering mtry will let fewer than all variables be tested in each node. This allows less dominant variables to contribute more to the final prediction. Low mtry is kinda analogous to ridge(regularized) regression. Regularization increases bias but lowers variance. Sometimes rough and robust is just better, some times not. You would know by cross-validation.
It sounds like you have ~500 samples(plenty) and only 3-6 variables(few). I would start out lazy and simply change mtry manually to all 3-6 values and look at the returned reported OOB-CV value.