I have fairly small dataset: 15 columns, 3500 rows and I am consistenly seeing that xgboost in h2o trains better model than h2o AutoML. I am using H2O 3.26.0.2 and Flow UI.
H2O XGBoost finishes in a matter of seconds while AutoML takes as long as it needs (20 mins) and always gives me worse performance.
I admit dataset might not be perfect but I would expect that AutoML with gridsearch would be as good (or better) than h2o XGBoost. My thinking is that AutoML will train multiple XGBoost model and do gridsearch on hyperparameters so it should be similar, right?
For both AutoML and XGBoost I use same training dataset and same response column.
Code for running experiment with XGBoost is:
import h2o
from h2o.estimators.xgboost import H2OXGBoostEstimator
h2o_frame = h2o.import_file(path="myFile.csv")
feature_columns = h2o_frame.columns
label_column = "responseColumn"
feature_columns.remove(label_column)
xgb = H2OXGBoostEstimator(nfolds=10, seed=1)
xgb.train(x=feature_columns, y=label_column, training_frame=h2o_frame)
# now export metrics to file
MRD = xgb.mean_residual_deviance()
RMSE= xgb.rmse()
MSE= xgb.mse()
MAE= xgb.mae()
RMSLE= xgb.rmsle()
header = ['model','mean_residual_deviance','rmse','mse','mae','rmsle']
with open('metrics.out', mode='w') as result_file:
writer = csv.writer(result_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow(header)
writer.writerow(['H2O_XGBoost', MRD, RMSE, MSE, MAE, RMSLE])
Code for running experiment with AutoML is:
import h2o
from h2o.automl import H2OAutoML
h2o_frame = h2o.import_file(path="myFile.csv")
feature_columns = h2o_frame.columns
label_column = "responseColumn"
feature_columns.remove(label_column)
aml = H2OAutoML(seed=1, nfolds=10, exclude_algos=["StackedEnsemble"], max_models=20)
aml.train(x=feature_columns, y=label_column, training_frame=h2o_frame)
# now export metrics to file
h2o.export_file(aml.leaderboard, "metrics.out", force = True, parts = 1)
Tried using different nfold, more models for AutoML, increasing early stopping rounds. I tried excluding all algorithms from AutoML (except XGBoost) and I still get same results.
Here are the differences in results:
H2O XGBoost:
model xgboost-5a8f9766-940c-4e5c-b57d-62b186f4c058
model_checksum 7409831159060775248
frame train_set_v01.hex
frame_checksum 6864971999838167226
description ·
model_category Regression
scoring_time 1566296468447
predictions ·
MSE 252.265021
RMSE 15.882853
nobs 3476
custom_metric_name ·
custom_metric_value 0
r2 0.726871
mean_residual_deviance 252.265021
mae 10.709369
rmsle NaN
XGBoost native params for xgboost-5a8f9766-940c-4e5c-b57d-62b186f4c058:
name value
silent true
eta 0.3
colsample_bylevel 1
objective reg:linear
min_child_weight 1
nthread 8
seed -1058380797
max_depth 6
colsample_bytree 1
lambda 1
gamma 0
alpha 0
booster gbtree
grow_policy depthwise
nround 50
subsample 1
max_delta_step 0
tree_method auto
H2O AutoML (winning model):
model StackedEnsemble_AllModels_AutoML_20190819_235446
model_checksum -6727284429527535576
frame automl_training_train_set_v01.hex
frame_checksum 6864971999838167226
description ·
model_category Regression
scoring_time 1566256209073
predictions ·
MSE 332.146239
RMSE 18.224880
nobs 3476
custom_metric_name ·
custom_metric_value 0
r2 0.640383
mean_residual_deviance 332.146239
mae 12.927023
rmsle 1.225650
residual_deviance 1154540.326762
null_deviance 3210476.302359
AIC 30070.640602
null_degrees_of_freedom 3475
residual_degrees_of_freedom 3464
And the best rated XGBoost model from same AutoML (third in the leaderboard):
model XGBoost_grid_1_AutoML_20190819_235446_model_5
model_checksum 8047828446507408480
frame automl_training_train_set_v01.hex
frame_checksum 6864971999838167226
description ·
model_category Regression
scoring_time 1566255442068
predictions ·
MSE 616.910151
RMSE 24.837676
nobs 3476
custom_metric_name ·
custom_metric_value 0
r2 0.332068
mean_residual_deviance 616.910151
mae 17.442629
rmsle 1.325149
XGBoost native params (for XGBoost_grid_1_AutoML_20190819_235446_model_5 in AutoML):
name value
silent true
normalize_type tree
eta 0.05
objective reg:linear
colsample_bylevel 0.8
nthread 8
seed 940795529
min_child_weight 15
rate_drop 0
one_drop 0
sample_type uniform
max_depth 20
colsample_bytree 1
lambda 100
gamma 0
alpha 0.1
booster dart
grow_policy depthwise
skip_drop 0
nround 120
subsample 0.8
max_delta_step 0
tree_method auto