2
votes

I've been trying to finish Andrew Ng's Machine Learning course, I am at the part about logistic regression now. I am trying to discover the parameters and also calculate the cost without using the MATLAB function fminunc. However, I am not converging to the correct results as posted by other students who have finished the assignment using fminunc. Specifically, my problems are:

  • the parameters theta are incorrect
  • my cost seems to be blowing up
  • I get many NaNs in my cost vector (I just create a vector of the costs to keep track)

I attempted to discover the parameters via Gradient Descent as how I understood the content. However, my implementation still seems to be giving me incorrect results.

dataset = load('dataStuds.txt');
x = dataset(:,1:end-1);
y = dataset(:,end);
m = length(x);

% Padding the the 1's (intercept term, the call it?)
x = [ones(length(x),1), x];
thetas = zeros(size(x,2),1);

% Setting the learning rate to 0.1
alpha = 0.1;


for i = 1:100000

    % theta transpose x (tho why in MATLAB it needs to be done the other way
    % round? :) 
    ttrx = x * thetas;
    % the hypothesis function h_x = g(z) = sigmoid(-z)
    h_x = 1 ./ (1 + exp(-ttrx));

    error = h_x - y;

    % the gradient (aka the derivative of J(\theta) aka the derivative
    % term)

    for j = 1:length(thetas)
        gradient = 1/m * (h_x - y)' * x(:,j);
        % Updating the parameters theta
        thetas(j) =  thetas(j) - alpha * gradient;
    end

    % Calculating the cost, just to keep track...
    cost(i) = 1/m * ( -y' * log(h_x) - (1-y)' * log(1-h_x) );
end

% Displaying the final theta's that I obtained
thetas

The parameters theta that I get are:

thetas =

-482.8509
3.7457
2.6976

The results below is from one example that I downloaded, but the author used fminunc for this one.

Cost at theta found by fminunc: 0.203506
theta: 
-24.932760 
0.204406 
0.199616 

The data:

34.6236596245170    78.0246928153624    0
30.2867107682261    43.8949975240010    0
35.8474087699387    72.9021980270836    0
60.1825993862098    86.3085520954683    1
79.0327360507101    75.3443764369103    1
45.0832774766834    56.3163717815305    0
61.1066645368477    96.5114258848962    1
75.0247455673889    46.5540135411654    1
76.0987867022626    87.4205697192680    1
84.4328199612004    43.5333933107211    1
95.8615550709357    38.2252780579509    0
75.0136583895825    30.6032632342801    0
82.3070533739948    76.4819633023560    1
69.3645887597094    97.7186919618861    1
39.5383391436722    76.0368108511588    0
53.9710521485623    89.2073501375021    1
69.0701440628303    52.7404697301677    1
67.9468554771162    46.6785741067313    0
70.6615095549944    92.9271378936483    1
76.9787837274750    47.5759636497553    1
67.3720275457088    42.8384383202918    0
89.6767757507208    65.7993659274524    1
50.5347882898830    48.8558115276421    0
34.2120609778679    44.2095285986629    0
77.9240914545704    68.9723599933059    1
62.2710136700463    69.9544579544759    1
80.1901807509566    44.8216289321835    1
93.1143887974420    38.8006703371321    0
61.8302060231260    50.2561078924462    0
38.7858037967942    64.9956809553958    0
61.3792894474250    72.8078873131710    1
85.4045193941165    57.0519839762712    1
52.1079797319398    63.1276237688172    0
52.0454047683183    69.4328601204522    1
40.2368937354511    71.1677480218488    0
54.6351055542482    52.2138858806112    0
33.9155001090689    98.8694357422061    0
64.1769888749449    80.9080605867082    1
74.7892529594154    41.5734152282443    0
34.1836400264419    75.2377203360134    0
83.9023936624916    56.3080462160533    1
51.5477202690618    46.8562902634998    0
94.4433677691785    65.5689216055905    1
82.3687537571392    40.6182551597062    0
51.0477517712887    45.8227014577600    0
62.2226757612019    52.0609919483668    0
77.1930349260136    70.4582000018096    1
97.7715992800023    86.7278223300282    1
62.0730637966765    96.7688241241398    1
91.5649744980744    88.6962925454660    1
79.9448179406693    74.1631193504376    1
99.2725269292572    60.9990309984499    1
90.5467141139985    43.3906018065003    1
34.5245138532001    60.3963424583717    0
50.2864961189907    49.8045388132306    0
49.5866772163203    59.8089509945327    0
97.6456339600777    68.8615727242060    1
32.5772001680931    95.5985476138788    0
74.2486913672160    69.8245712265719    1
71.7964620586338    78.4535622451505    1
75.3956114656803    85.7599366733162    1
35.2861128152619    47.0205139472342    0
56.2538174971162    39.2614725105802    0
30.0588224466980    49.5929738672369    0
44.6682617248089    66.4500861455891    0
66.5608944724295    41.0920980793697    0
40.4575509837516    97.5351854890994    1
49.0725632190884    51.8832118207397    0
80.2795740146700    92.1160608134408    1
66.7467185694404    60.9913940274099    1
32.7228330406032    43.3071730643006    0
64.0393204150601    78.0316880201823    1
72.3464942257992    96.2275929676140    1
60.4578857391896    73.0949980975804    1
58.8409562172680    75.8584483127904    1
99.8278577969213    72.3692519338389    1
47.2642691084817    88.4758649955978    1
50.4581598028599    75.8098595298246    1
60.4555562927153    42.5084094357222    0
82.2266615778557    42.7198785371646    0
88.9138964166533    69.8037888983547    1
94.8345067243020    45.6943068025075    1
67.3192574691753    66.5893531774792    1
57.2387063156986    59.5142819801296    1
80.3667560017127    90.9601478974695    1
68.4685217859111    85.5943071045201    1
42.0754545384731    78.8447860014804    0
75.4777020053391    90.4245389975396    1
78.6354243489802    96.6474271688564    1
52.3480039879411    60.7695052560259    0
94.0943311251679    77.1591050907389    1
90.4485509709636    87.5087917648470    1
55.4821611406959    35.5707034722887    0
74.4926924184304    84.8451368493014    1
89.8458067072098    45.3582836109166    1
83.4891627449824    48.3802857972818    1
42.2617008099817    87.1038509402546    1
99.3150088051039    68.7754094720662    1
55.3400175600370    64.9319380069486    1
74.7758930009277    89.5298128951328    1
1

1 Answers

5
votes

I ran your code and it does work fine. However, the tricky thing about gradient descent is ensuring that your costs don't diverge to infinity. If you look at your costs array, you will see that the costs definitely diverge and this is why you are not getting the correct results.

The best way to eliminate this in your case is to reduce the learning rate. Through experimentation, I have found that a learning rate of alpha = 0.003 is the best for your problem. I've also increased the number of iterations to 200000. Changing these two things gives me the following parameters and associated cost:

>> format long g;
>> thetas

thetas =

         -17.6287417780435
         0.146062780453677
         0.140513170941357

>> cost(end)

ans =

         0.214821863463963

This is more or less in line with the magnitudes of the parameters you see when you are using fminunc. However, they get slightly different parameters as well as different costs because of the actual minimization method itself. fminunc uses a variant of L-BFGS which finds the solution in a much faster way.

What is most important is the actual accuracy itself. Remember that to classify whether an example belongs to label 0 or 1, you take the weighted sum of the parameters and examples, run it through the sigmoid function and threshold at 0.5. We find what the average amount of times each expected label and predicted label match.

Using the parameters we found with gradient descent gives us the following accuracy:

>> ttrx = x * thetas;
>> h_x = 1 ./ (1 + exp(-ttrx)) >= 0.5;
>> mean(h_x == y)

ans =

                      0.89

This means that we've achieved an 89% classification accuracy. Using the labels provided by fminunc also gives:

>> thetas2 = [-24.932760; 0.204406; 0.199616];
>> ttrx = x * thetas2;
>> h_x = 1 ./ (1 + exp(-ttrx)) >= 0.5;
>> mean(h_x == y)

ans =

                      0.89

So we can see that the accuracy is the same so I wouldn't worry too much about the magnitude of the parameters but it's more in line with what we see when we compare the costs between the two implementations.

As a final note to you, I would suggest looking at this post of mine for some tips on how to make logistic regression work over long-term. I would definitely recommend normalizing your features prior to finding the parameters to make the algorithm run faster. It also addresses why you were finding the wrong parameters (namely the cost blowing up): Cost function in logistic regression gives NaN as a result.