Binary classification predict() method : sklearn vs keras

Question

I try to migrate my sklearn code to keras on a basic binary classification example. I have question about the keras predict() method that returns different than sklearn.

sklearn

print("X_test:")
print(X_test)
y_pred = model.predict(X_test)
print("y_pred:")
print(y_pred)

XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, nthread=-1, objective='binary:logistic', reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True, subsample=1)

--- Predict Sklearn ---

X_test: [[ 1. 90. 62. ..., 27.2 0.58 24. ] [ 7. 181. 84. ..., 35.9 0.586 51. ] [ 13.
152. 90. ..., 26.8 0.731 43. ] ..., [ 4. 118. 70. ..., 44.5 0.904 26. ] [ 7. 152. 88. ..., 50. 0.337 36. ] [ 7. 168. 88. ..., 38.2 0.787 40. ]]

y_pred: [ 0. 1. 1. 0. 1. 1. 0. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. ... 0. 0. 1. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 1. 1.]

Datatype of predict() return : a binary vector of X_test dimension (OK).

Keras

score = aTSSeqModel.evaluate(X_test, Y_test2, batch_size=32)

Score: [1.4839521383676004, 0.6338582667778796]

What those 2 values represent ?

print("--- Predict Keras ---")
print("X_test:")
print(X_test)
Y_pred2 = aTSSeqModel.predict(X_test, batch_size=32)
print("Y_pred:")
print(Y_pred2)

keras.models.Sequential object at 0x7fae3a60b438

--- Predict Keras ---

X_test: [[ 1. 90. 62. ..., 27.2 0.58 24. ] [ 7. 181. 84. ..., 35.9 0.586 51. ] [ 13.
152. 90. ..., 26.8 0.731 43. ] ..., [ 4. 118. 70. ..., 44.5 0.904 26. ] [ 7. 152. 88. ..., 50. 0.337 36. ] [ 7. 168. 88. ..., 38.2 0.787 40. ]]

Y_pred: [[ 9.07712865e-21] [ 0.00000000e+00] [ 1.27839347e-25] [ 2.38120656e-22] [ 5.51314650e-20] [ 1.99869346e-22] [ 1.54212393e-19]...

Is it the correct way to use predict() with keras model ?

I would expect a binary vector, as sklean does, that is the result of prediction apply to the X_test data set. What represents that 2D vector and its values ?

Thanks for answers.

You need to include your Keras model to get an answer for both questions. — Dr. Snoopy

Daniele Grattarola Daniele Grattarola · Accepted Answer · 2018-02-06T13:34:23

This is a seriously ill-posed question, but I'll try to address your issues. Please check the guidelines next time.

What those 2 values represent ?

Assuming you compiled your model with the metrics flag set as

model.compile(optimizer='...', loss='...', metrics=['acc'])

then the call to model.evaluate(X, Y) will return an array in which the first value is the loss between model.predict(X) and Y, and the second value is the accuracy on the same data.
It extends to other metrics, too, obviously.

Is it the correct way to use predict() with keras model?

It is.
Scikit-learn's predict() returns an array of shape (n_samples, ), whereas Keras' returns an array of shape (n_samples, 1). The two arrays are equivalent for your purposes, but the one from Keras is a bit more general, as it more easily extends to the multi-dimensional output case. To convert from the Keras output to Sklearn's, simply call y_pred.reshape(-1).
As to why the values from Scikit-learn get rounded automatically, I have no idea, but it might have to do with the internal datatypes used by Sklearn. If you wish, you can round the values from Keras like this:

y_pred[y_pred <= 0.5] = 0.
y_pred[y_pred > 0.5] = 1.

Cheers

Binary classification predict() method : sklearn vs keras

sklearn

Keras

1 Answers