0
votes

sklearn version = 0.23.2

MRE:

studies = np.random.uniform(0, 10, 100)
slept = np.random.uniform(0, 10, 100)

df = pd.DataFrame({"studied":np.random.uniform(0,10,100),
                   "sleep":np.random.uniform(0,10,100)})

def PassFail(study, sleep):
    
    if study + sleep >= 10:
        return 1
    else:
        return 0

df["pass"] = df.apply(lambda x: PassFail(x["studied"], x["sleep"]), axis=1)

you pass if sum of studied, sleep is greater or equal to 10 (separated by diagonal line).

After training using above data and printing coef_

y = df["pass"].values
X = df.drop(columns=["pass"]).values

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.1)
lr = LogisticRegression().fit(X_train,y_train)

lr.coef_
>>> array([[2.0648521 , 1.89582556]])

From my understanding coef_ are weights of each x variable studied, slept. However when I try to make a prediction by hard coding it

def z_value(x,w):
    """x,w are in array of array form[[x1,x2]]"""
    return np.dot(x.reshape(-1), w.reshape(-1))

def SigmoidFunction(z):
    return 1.0 / (1+np.exp(-z))

new_x = np.array([[0.2, 1]])
z = z_value(new_x, lr.coef_)
p = SigmoidFunction(z)

p = 0.90.... Which doesn't make sense to be because such low values for x variables should be classified as "Fail"(0) therefore p should be very low (close to 0). Is coef_ from sklearn logistic regression weights of x variables?

1

1 Answers

1
votes

you missed the intercept when computing "z_value"