I tried to use H2O to create some machine learning models for binary classification problem, and the test results are pretty good. But then I checked and found something weird. I tried to print the prediction of the model for the test set out of curiosity. And I found out that my model actually predicts 0 (negative) all the time, but the AUC is around 0.65, and precision is not 0.0. Then I tried to use Scikit-learn just to compare the metrics scores, and (as expected) they’re different. The Scikit learn yielded 0.0 precision and 0.5 AUC score, which I think is correct. Here's the code that I used:
model = h2o.load_model(model_path)
predictions = model.predict(Test_data).as_data_frame()
# H2O version to print the AUC score
auc = model.model_performance(Test_data).auc()
# Python version to print the AUC score
auc_sklearn = sklearn.metrics.roc_auc_score(y_true, predictions['predict'].tolist())
Any thought? Thanks in advance!