I am newby on data science and would like to ask for help of model selection.
I have built 8 models to predict Salary vs year exp, position name and location. Then, I tried to compare 8 models by RMSE. But finally, I am not sure that which model I should select. (In m mind, I prefer model 8 because after test with random forest, the result is better than Regression, then I have used all data set to make final version but it is more difficult to interpret coef than regression) Can you help which model do you prefer and why? And in reality, do data scientist do the process like this or they have automatic way to deal with?
1 RMSElm1 : model: linear regression, data: Train 80%, test 20% No any imputation = 22067.58
2 RMSElm2:model: linear regression, data: Train 80%, test 20%: Imputation some locations which I think they give the same idea of salary = 22115.64
3 RMSElm3: model: linear regression+ Stepwise, data: Train 80%, test 20% No any imputation = 22081.06
4 RMSEdeep1: model: Deep learning (H2O package activation = 'Rectifier', hidden c(5,5),epochs = 100,), data: Train 80%, test 20%: No any imputation = 16265.13
5 RMSErf1: model: Random forest (ntree =10),data: Train 80%, test 20% No any imputation = 14669.92
6 RMSErf2: model: Random forest (ntree =500),data: Train 80%, test 20% No any imputation [1] 14669.92
7 RMSErf3: model: Random forest (ntree =10,)data: K-Fold 10 No any imputation [1] 14440.82
8 RMSErf4 model: Random forest (ntree =10),data: all dataset No any imputation [1] 13532.74