How to compute "y_train_true, y_train_prob, y_test_true, y_test_prob"?

Question

I have computed X_train, X_test, y_train, y_test. But I can not compute y_train_true, y_train_prob, y_test_true, y_test_prob.

How can I compute y_train_true, y_train_prob, y_test_true, y_test_prob from the following code ?

X_train:

X_test:

y_train:

y_test:

N.B,

y_train_true: True binary labels of 0 or 1 in the training dataset

y_train_prob: Probability in range {0,1} predicted by the model for the training dataset

y_test_true: True binary labels of 0 or 1 in the testing dataset

y_test_prob: Probability in range {0,1} predicted by the model for the testing dataset

Code :

# Split test and train data
import numpy as np
from sklearn.model_selection import train_test_split
X = np.array(dataset.ix[:, 1:10])
y = np.array(dataset['benign_malignant'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#Define Classifier and ====
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
# knn = KNeighborsClassifier(n_neighbors=11)
knn.fit(X_train, y_train)

# Predicting the Test set results
y_pred = knn.predict(X_train)

Dat Tran Dat Tran · Accepted Answer · 2017-08-24T17:10:08

Well in your case y_train and y_test is already y_train_true and y_test_true. To get y_train_prob and y_test_prob, you need to take a model. I don't know which dataset you're using but it seems to be a binary classification problem so that you could use logistic regression to do this so,

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
y_train_prob = knn.predict_proba(X_train)
y_test_prob = knn.predict_proba(X_test)