0
votes

I am trying to implement logistic regression from scratch. Here I got confused that initially we got a single random value for the weight. However, as the process goes. I found that the final result of the training gives multiple weights (matches to the number of data points in the training set). I completely got no Idea here since the predictions are working properly yet I think it does not make any sense to have multiple weights for a single feature. I've also mentioned my problems in the code below.

np.random.seed(100)
class LogisticRegression:

def sigmoid(self, z): return 1 / (1 + np.e**(-z))

def cost_function(self, X, y, weights):                 
    z = X*weights
    predict_1 = y * np.log(self.sigmoid(z))
    predict_0 = (1 - y) * np.log(1 - self.sigmoid(z))
    #print(-sum(predict_1 + predict_0) / len(X))
    return -sum(predict_1 + predict_0) / len(X)

def fit(self, X, y, epochs=250, lr=0.05):        
    loss = []
    weights = np.rand()    # Initially weights here is a single number...
    N = len(X)
             
    for _ in range(epochs):
        # Gradient Descent
        y_hat = self.sigmoid(X*weights)
        weights -= lr * X*(y_hat - y) / N    # ...But then the number of weights 
                                             # become equal to the number of
                                             # data points at this line...
        # Saving Progress
        loss.append(self.cost_function(X, y, weights)) 
        
    self.weights = weights
    self.loss = loss
    print('weights:', weights)     # ...Which causes us to get different 
                                   # weight for each data points.
                                   # How can I plot the final logistic curve then
                                   # if I got multiple final weights?
    #print(loss)

def predict(self, X):        
    # Predicting with sigmoid function
    z = X*self.weights
    # Returning binary result
    return [1 if i > 0.5 else 0 for i in self.sigmoid(z)]
    #print(self.sigmoid(z))
clf = LogisticRegression()
clf.fit(X,y)
clf.predict(X)

enter image description here

1

1 Answers

0
votes

The weight increement should include a sum over the datapoints. Take a look at this page for more details about the backpropagation derivation.

So the weights should be updated with something like:

weights -= lr * sum(X*(y_hat - y)) / N

instead of:

weights -= lr * X*(y_hat - y) / N

With this, you only get one weight as expected.