Why doesn't my Gradient descent algorithm converge? (For Logistic Regression)

Question

I have an assignment which says to implement logistic regression in c++ using gradient descent. Part of the assignment is to make the gradient descent stop when the magnitude of the gradient is below 10e-07.

I have to minimize: //chart.googleapis.com/chart?cht=tx&chl=L(w)%20%3D%20%5Cfrac%7B1%7D%7BN%7D%5Csum%20log(1%20%2B%20exp(-y_%7Bi%7Dw%5E%7BT%7Dx_%7Bi%7D))

However my gradient descent keeps stopping due to max iterations surpassed. I have tried with various max iteration thresholds, and they all max out. I think there is something wrong with my code, since logistic regression is supposedly an easy task for gradient descent due to the concave nature of its cost function, the gradient descent should easily find the minium.

I am using the armadillo library for matrices and vectors.

#include "armadillo.hpp"

using namespace arma;

double Log_Likelihood(Mat<double>& x, Mat<int>& y, Mat<double>& w)
    {
        Mat<double> L;
        double L_sum = 0;
        for (int i = 0; i < x.n_rows; i++)
        {
            L = log(1 + exp(-y[i] * w * x.row(i).t() ));
            L_sum += as_scalar(L);
        }

        return L_sum / x.n_rows;
    }

Mat<double> Gradient(Mat<double>& x, Mat<int>& y, Mat<double>& w)
    {
        Mat<double> grad(1, x.n_cols);
        for (int i = 0; i < x.n_rows; i++)
        {
            grad = grad + (y[i] * (1 / (1 + exp(y[i] * w * x.row(i).t()))) * x.row(i));
        }

        return -grad / x.n_rows;

    }

void fit(Mat<double>& x, Mat<int>& y, double alpha = 0.05, double threshold = pow(10, -7), int maxiter = 10000)
    {
        w.set_size(1, x.n_cols);
        w = x.row(0);
        int iter = 0;
        double log_like = 0;
        while (true)
        {
            log_like = Log_Likelihood(x, y, w);

            if (iter % 1000 == 0)
            {
                std::cout << "Iter: " << iter << " -Log likelihood = " << log_like << " ||dL/dw|| = " << norm( Gradient(x, y, w), 2) << std::endl;
            }
            iter++;

            if ( norm( Gradient(x, y, w), 2) < threshold)
            {
                std::cout << "Magnitude of gradient below threshold." << std::endl;
                break;
            }

            if (iter == maxiter)
            {
                std::cout << "Max iterations surpassed." << std::endl;
                break;
            }


            w = w - (alpha * Gradient(x, y, w));

        }
    }

I want the gradient descent to stop because the magnitude of the gradient falls below 10e-07.

My labels are {1, -1}.

The log-likelihood (which you want to maximize) is concave, the cost function which is negative log-likelihood is convex. — Juan Carlos Ramirez
Whats the difference between writing pow(10, -7) and 1.0e-7? — Kasper Fischer-Rasmussen
After optimisations, probably nothing. pow(10, -7) is a function call involving two literals, 1.0e-7 is a literal — Caleth

Juan Carlos Ramirez Juan Carlos Ramirez · Accepted Answer · 2019-08-08T18:16:51

Verify that your loglikelihood is increasing towards convergence by recording or plotting the values at every iteration, and also check that the norm of the gradient is going towards 0. You should be doing gradient ascent, so add the gradient instead of subtracting it. If the norm of the gradient consistently increases it means you are not going in a direction towards the optimum. If on the other hand, the norm of the gradient "jumps around" but doesn't go to 0, then you should reduce your stepsize/learning rate alpha and try again.

Plotting and analyzing these values will be helpful to debug and analyze your algorithm.

Why doesn't my Gradient descent algorithm converge? (For Logistic Regression)

1 Answers