0
votes

I am running h2o random forest with the following parameter setting

model_rf <- h2o.randomForest(x = predictors, y = labels,
                         training_frame = train_data, classification = T,
                         importance = T,
                         verbose = T, type = "BigData", ntree = 50)

After running I am getting the following output.

Model Details:
==============

H2ORegressionModel: drf
Model ID:  DRFModel__906d074da6ebf8057525b2b61c1c4c87 
Model Summary:
  number_of_trees model_size_in_bytes min_depth max_depth mean_depth      min_leaves  max_leaves mean_leaves
1       50.000000      2708173.000000 20.000000 20.000000   20.00000     4200.000000 5241.000000  4720.70000


H2ORegressionMetrics: drf
** Reported on training data. **
Description: Metrics reported on Out-Of-Bag training samples

MSE:  0.0006302392
R2 :  -0.03751038

Following are my questions.

1) What does MSE and R2 mean?

2) If they are mean square error or similar why am I getting these metric for a classification setting?

3) How do I get other metrics like gini or auc?

4) Can i say that if these 2 params decrease with a different parameter setting, my model performance has improved?

1

1 Answers

0
votes

Here are the answers to your questions: 1. MSE stands for mean squared error. Essentially it measures the difference between the estimator and the estimated.R2 measures how well-fit a statistical model is.

  1. Using MSE you can judge how often you model misclassified data.

  2. If you are using Flow, click on Inspect and then output-training_metrics to see MSE, R2, AUC, gini, etc.

  3. Sorry, I'm not sure I understand this question. Are you asking if a decreaed gini or AUC equate to improved model performance?

Avni