1
votes

I have written my code for logistic regression in python and compared its results with Scikit-learn's logistic regression. Later is performing worse on a simple one dimensional sample data as showcased below:

My logistic

import pandas as pd

import numpy as np

def findProb(xBias, beta):

    z = []
    for i in range(len(xBias)):
        z.append(xBias.iloc[i,0]*beta[0] + xBias.iloc[i,1]*beta[1])
    prob = [(1/(1+np.exp(-i))) for i in z]
    return prob

def calDerv(xBias, y, beta, prob):

    derv = []
    for i in range(len(beta)):
        helpVar1 = 0
        for j in range(len(xBias)):
            helpVar2 = prob[j]*xBias.iloc[j,i] - y[j]*xBias.iloc[j,i]
            helpVar1 = helpVar1 + helpVar2
        derv.append(helpVar1/len(xBias))
    return derv

def updateBeta(beta, alpha, derv):

    for i in range(len(beta)):
        beta[i] = beta[i] - derv[i]*alpha
    return beta

def calCost(y, prob):

    cost = 0
    for i in range(len(y)):
        if y[i] == 1: eachCost = -y[i]*np.log(prob[i])
        else: eachCost = -(1-y[i])*np.log(1-prob[i])
        cost = cost + eachCost
    return cost

def myLogistic(x, y, alpha, iters):

    beta = [0 for i in range(2)]
    bias = [1 for i in range(len(x))]
    xBias = pd.DataFrame({'bias': bias, 'x': x})
    for i in range(iters):
        prob = findProb(xBias, beta)
        derv = calDerv(xBias, y, beta, prob)
        beta = updateBeta(beta, alpha, derv)
    return beta

Comparing results on a small sample data

input = list(range(1, 11))

labels = [0,0,0,0,0,1,1,1,1,1]

print("\nmy logistic")

learningRate = 0.01

iterations = 10000

beta = myLogistic(input, labels, learningRate, iterations)

print("coefficients: ", beta)

print("decision boundary is at x = ", -beta[0]/beta[1])

decision = -beta[0]/beta[1]

predicted = [0 if i < decision else 1 for i in input]

print("predicted values: ", predicted)

Output: 0, 0, 0, 0, 0, 1, 1, 1, 1, 1

print("\npython logistic")

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

input = np.reshape(input, (-1,1))

lr.fit(input, labels)

print("coefficient = ", lr.coef_)

print("intercept = ", lr.intercept_)

print("decision = ", -lr.intercept_/lr.coef_)

predicted = lr.predict(input)

print(predicted)

Output: 0, 0, 0, 1, 1, 1, 1, 1, 1, 1

2

2 Answers

3
votes

Your implementation has no regularisation term. The LinearRegression estimator includes by default regularisation with inverse strength C = 1.0. As you set C to higher values, i.e. weaken the regularisation, the decision boundary moves closer to 5.5:

for C in [1.0, 1000.0, 1e+8]:
    lr = LogisticRegression(C=C)
    lr.fit(inp, labels)
    print(f'C = {C}, decision boundary @ {(-lr.intercept_/lr.coef_[0])[0]}')

Output:

C = 1.0, decision boundary @ 3.6888430562595116
C = 1000.0, decision boundary @ 5.474229032805065
C = 100000000.0, decision boundary @ 5.499634348989383
0
votes

The self defined function or any logistic function will depend on following things-

  1. Learning rate - alpha
  2. Number of iterations

Hence with some adjustment in your learning rate and number of iterations one can find approximately equal weights.

For further analysis you can refer to this link - https://medium.com/@martinpella/logistic-regression-from-scratch-in-python-124c5636b8ac