3
votes

I have implemented Logistic Regression with Gradient Descent in Java. It doesn't seem to work well (It does not classify records properly; the probability of y=1 is a lot.) I don't know whether my implementation is correct.I have gone through the code several times and i am unable to find any bug. I have been following Andrew Ng's tutorials on Machine learning on Course Era. My Java implementation has 3 classes. namely :

  1. DataSet.java : To read the data set
  2. Instance.java : Has two members : 1. double[ ] x and 2. double label
  3. Logistic.java : This is the main class that implements Logistic Regression with Gradient Descent.

This is my cost function:

J(Θ) = (- 1/m ) [Σmi=1 y(i) log( hΘ( x(i) ) ) + (1 - y(i) ) log(1 - hΘ (x(i)) )]

For the above Cost function, this is my Gradient Descent algorithm:

Repeat (

Θj := Θj - α Σmi=1 ( hΘ( x(i)) - y(i) ) x(i)j

(Simultaneously update all Θj )

)

import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

public class Logistic {

    /** the learning rate */
    private double alpha;

    /** the weight to learn */
    private double[] theta;

    /** the number of iterations */
    private int ITERATIONS = 3000;

    public Logistic(int n) {
        this.alpha = 0.0001;
        theta = new double[n];
    }

    private double sigmoid(double z) {
        return (1 / (1 + Math.exp(-z)));
    }

    public void train(List<Instance> instances) {

    double[] temp = new double[3];

    //Gradient Descent algorithm for minimizing theta
    for(int i=1;i<=ITERATIONS;i++)
    {
       for(int j=0;j<3;j++)
       {      
        temp[j]=theta[j] - (alpha * sum(j,instances));
       }

       //simulataneous updates of theta  
       for(int j=0;j<3;j++)
       {
         theta[j] = temp[j];
       }
        System.out.println(Arrays.toString(theta));
    }

    }

    private double sum(int j,List<Instance> instances)
    {
        double[] x;
        double prediction,sum=0,y;


       for(int i=0;i<instances.size();i++)
       {
          x = instances.get(i).getX();
          y = instances.get(i).getLabel();
          prediction = classify(x);
          sum+=((prediction - y) * x[j]);
       }
         return (sum/instances.size());

    }

    private double classify(double[] x) {
        double logit = .0;
        for (int i=0; i<theta.length;i++)  {
            logit += (theta[i] * x[i]);
        }
        return sigmoid(logit);
    }


    public static void main(String... args) throws FileNotFoundException {

      //DataSet is a class with a static method readDataSet which reads the dataset
      // Instance is a class with two members: double[] x, double label y
      // x contains the features and y is the label.

        List<Instance> instances = DataSet.readDataSet("data.txt");
      // 3 : number of theta parameters corresponding to the features x 
      // x0 is always 1   
        Logistic logistic = new Logistic(3);
        logistic.train(instances);

        //Test data
        double[]x = new double[3];
        x[0]=1;
        x[1]=45;
        x[2] = 85;

        System.out.println("Prob: "+logistic.classify(x));


    }
}

Can anyone tell me what am I doing wrong? Thanks in advance! :)

1
I think you need to begin by deciding whether you have a Java problem or a machine learning problem. Does your Java program implement the intended function, regardless of whether it is the right function? You should be able to tell that from unit tests.Patricia Shanahan
You implemented gradient ascend, not descent. also you need to divide the sum by the number of instances you processed- that's the reason why you weights are exploding.Thomas Jungblut
@Thomas Sorry for that. I was trying out different things and I forgot to change it back to minus. I did the edit. Even if it's minus, it's not working as expected.Srinidhi Shankar
But you did read my previous comment until the end, right?:PThomas Jungblut
Maybe you want to have a look at some more idiomatic python code: stackoverflow.com/questions/17784587/…Thomas Jungblut

1 Answers

1
votes

As I am studying logistic regression, I took the time to review your code in detail.

TLDR

In fact, it appears the algorithm is correct.

The reason you had so much false negatives or false positives is, I think, because of the hyper parameters you choose.

The model was under-trained so the hypothesis was under-fitting.

Details

I had to create the DataSet and Instance classes because you did not publish them, and set up a train data set and a test data set based on the Cryotherapy dataset. See http://archive.ics.uci.edu/ml/datasets/Cryotherapy+Dataset+.

Then, using your same exact code (for the logistic regression part) and by choosing an alpha rate of 0.001 and a number of iterations of 100000, I got a precision rate of 80.64516129032258 percent on the test data set, which is not so bad.

I tried to get a better precision rate by tweaking manualy those hyper parameters but could not obtain any better result.

At this point, an enhancement would be to implement regularization, I suppose.

Gradient descent formula

In Andrew Ng's video about the the cost function and gradient descent, it is correct that the 1/m term is omitted. A possible explanation is that the 1/m term is included in the alpha term. Or maybe it's just an oversight. See https://www.youtube.com/watch?v=TTdcc21Ko9A&index=36&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&t=6m53s at 6m53s.

But if you watch Andrew Ng's video about regularization and logistic regression you'll notice that the term 1/m is clearly present in the formula. See https://www.youtube.com/watch?v=IXPgm1e0IOo&index=42&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&t=2m19s at 2m19s.