H2o issue: classification model gradient boost and random Forrest

Question

I am trying to predict the Costa Rican Household Poverty Level Prediction. There are 4 levels in the "Target" column which I already converted to factor. However, I could not look up my AUC or do grid search. I keep encountering this error

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Invalid argument for sort_by specified. Must be one of: [r2, mean_per_class_accuracy, max_per_class_error, err, total_rows, rmse, accuracy, err_count, logloss, mse, mean_per_class_error]

This somehow my model was set up as a regression model, not a classification model The entire code:

class(train3.na$Target)    
gradientboost=as.h2o(train3.na)
          split=h2o.splitFrame(gradientboost,c(.6,.2),seed=1234)
          train_gb1=h2o.assign(split[[1]],"valid.hex")
          valid_gb1=h2o.assign(split[[2]],"valid.hex")
          test_gb1=h2o.assign(split[[3]],"valid.hex")
          gbm_params <- list(learn_rate = c(0.01, 0.1),
                              max_depth = c(3, 5, 9),
                              sample_rate = c(0.8, 1.0),
                              col_sample_rate = c(0.2, 0.5, 1.0))

          gbm_grid1=h2o.grid("gbm",training_frame = train_gb1,validation_frame = valid_gb1,x=1:51,y=52,
                             grid_id ="gbm_grid1",hyper_parameters=gbm_params, ntrees=30,seed=2000000)

          gbm_gridperf1 <- h2o.getGrid(grid_id = "gbm_grid1",sort_by = "auc",
                                       decreasing = TRUE)

Lauren Lauren · Accepted Answer · 2018-12-06T17:22:40

AUC is only available for binary classification, if you are interested in a multi-class classification metric you can try using logloss, for example.

Here's a description for AUC from the docs (you can also use this link to learn more about what metrics can be used for multi-class classification problems):

AUC (Area Under the ROC Curve) This model metric is used to evaluate how well a binary classification model is able to distinguish between true positives and false positives. An AUC of 1 indicates a perfect classifier, while an AUC of .5 indicates a poor classifier, whose performance is no better than random guessing. H2O uses the trapezoidal rule to approximate the area under the ROC curve. H2O uses the trapezoidal rule to approximate the area under the ROC curve. (Tip: AUC is usually not the best metric for an imbalanced binary target because a high number of True Negatives can cause the AUC to look inflated. For an imbalanced binary target, we recommend AUCPR or MCC.)

H2o issue: classification model gradient boost and random Forrest

1 Answers