sklearn version = 0.23.2
MRE:
studies = np.random.uniform(0, 10, 100)
slept = np.random.uniform(0, 10, 100)
df = pd.DataFrame({"studied":np.random.uniform(0,10,100),
"sleep":np.random.uniform(0,10,100)})
def PassFail(study, sleep):
if study + sleep >= 10:
return 1
else:
return 0
df["pass"] = df.apply(lambda x: PassFail(x["studied"], x["sleep"]), axis=1)
you pass if sum of studied, sleep is greater or equal to 10 (separated by diagonal line).
After training using above data and printing coef_
y = df["pass"].values
X = df.drop(columns=["pass"]).values
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.1)
lr = LogisticRegression().fit(X_train,y_train)
lr.coef_
>>> array([[2.0648521 , 1.89582556]])
From my understanding coef_ are weights of each x variable studied, slept. However when I try to make a prediction by hard coding it
def z_value(x,w):
"""x,w are in array of array form[[x1,x2]]"""
return np.dot(x.reshape(-1), w.reshape(-1))
def SigmoidFunction(z):
return 1.0 / (1+np.exp(-z))
new_x = np.array([[0.2, 1]])
z = z_value(new_x, lr.coef_)
p = SigmoidFunction(z)
p = 0.90.... Which doesn't make sense to be because such low values for x variables should be classified as "Fail"(0) therefore p should be very low (close to 0). Is coef_ from sklearn logistic regression weights of x variables?