8
votes

I am working on a website where I collect the results of chess games that people have played. Looking at the ratings of the player and the difference between their rating and that of their opponent, I plot a graph with dots representing win (green), draw (blue), and loss (red).

With this information, I also implemented a logistic regression algorithm to categorize the cutoffs for winning and win/drawing. Using the rating and the difference as my two features, I get a classifier, and then draw the boundaries on the chart for where the classifier changes its prediction.

My code for gradient descent, the cost function, and the sigmoid function are below.

  def gradient_descent()
    oldJ = 0    
    newJ = J()
    alpha = 1.0     # Learning rate
    run = 0
    while (run < 100) do
      tmpTheta = Array.new
      for j in 0...numFeatures do
        sum = 0
        for i in 0...m do
          sum += ((h(training_data[:x][i]) - training_data[:y][i][0]) * training_data[:x][i][j])
        end
        tmpTheta[j] = Array.new
        tmpTheta[j][0] = theta[j, 0] - (alpha / m) * sum  # Alpha * partial derivative of J with respect to theta_j
      end
      self.theta = Matrix.rows(tmpTheta)
      oldJ = newJ
      newJ = J()
      run += 1
      if (run == 100 && (oldJ - newJ > 0.001)) then run -= 20 end   # Do 20 more if the error is still going down a fair amount.
      if (oldJ < newJ)
        alpha /= 10
      end
    end
  end

  def J()
    sum = 0
    for i in 0...m
      sum += ((training_data[:y][i][0] * Math.log(h(training_data[:x][i]))) 
          + ((1 - training_data[:y][i][0]) * Math.log(1 - h(training_data[:x][i]))))
    end
    return (-1.0 / m) * sum
  end

  def h(x)
    if (x.class != 'Matrix')    # In case it comes in as a row vector or an array
      x = Matrix.rows([x])      # [x] because if it's a row vector we want [[a, b]] to get an array whose first row is x.
    end
    x = x.transpose   # x is supposed to be a column vector, and theta^ a row vector, so theta^*x is a number.
    return g((theta.transpose * x)[0, 0])  # theta^ * x gives [[z]], so get [0, 0] of that for the number z.
  end

  def g(z)
    tmp = 1.0 / (1.0 + Math.exp(-z))   # Sigmoid function
    if (tmp == 1.0) then tmp = 0.99999 end    # These two things are here because ln(0) DNE, so we don't want to do ln(1 - 1.0) or ln(0.0)
    if (tmp == 0.0) then tmp = 0.00001 end
    return tmp
  end

When I test this on the data set representing my own chess profile, I get reasonable results that I can be happy with:

12842311:  Correct result

For a while, I was happy. All the examples I tried gave interesting charts. Then I tried a player, Kevin Cao, who had over 250 tournaments to his name, and therefore 1000+ games, for a very large training set. The result was obviously incorrect:

12905349:  Incorrect result

Well, that was no good. So I increased the initial learning rate from 1.0 to 100.0 as my first idea. That got what looks like the right results for Kevin:

12905349:  Correct result

Unfortunately, when I then tried it on myself and my smaller data set, I got the strange phenomenon that it just gave a flat line at 0 for one of the predictions:

12842311:  Incorrect result

I checked theta, and it said it was [[2.3707682771730836], [21.22408286825226], [-19081.906528679192]]. The third training variable (really second, since x_0 = 1) is the difference in ratings, so when the difference is just the tiniest bit positive, the formula for logistic regression goes way negative, and the sigmoid function predicts y = 0. When the difference is just the slightly bit positive, similarly, it jumps way up and predicts y = 1.

I reduced the initial learning rate back to 1.0 from 100.0, and decided to instead try reducing it more slowly. So instead of reducing it by a factor of ten when the cost function increases, I reduced it by a factor of two.

Unfortunately, this didn't change the result for me at all. Even if I increased the number of loops of gradient descent from 100 to 1000, it still kept predicting that wrong outcome.

I'm still quite the beginner to logistic regression (I just finished the machine learning class on coursera and this is my first time attempting to implement any of the algorithms I learned there), so I've reached about the extent of my intuition. If somebody would help me figure out what is going wrong here, what I am doing wrong, and how I can fix it I would be extremely grateful.

EDIT: I also tried it on another data set, which had about 300 data points, and got, once again, a flat green line and a normal blue line. The algorithm is basically the same for both, just some different results for y because I'm doing multi-class classification.

EDIT: Since people have asked for it, here is J, Alpha, and Theta for each iteration of gradient descent for the line that flatlines:

J: 1.7679949412730092  Alpha: 1.0  Theta: Matrix[[-0.004477611940298508], [0.2835820895522388], [-123.63880597014925]]
J: 0.6873432218114784  Alpha: 0.1  Theta: Matrix[[-0.008057848266678727], [-8.033992854843122], [-118.62571350649955]]
J: 2.7493579020963597  Alpha: 0.1  Theta: Matrix[[0.0035837099422764904], [10.036108977992713], [-114.29679460799208]]
J: 2.5431564907845736  Alpha: 0.01  Theta: Matrix[[0.002061352330336195], [7.255061503962862], [-113.88091708799209]]
J: 2.268221136398013  Alpha: 0.01  Theta: Matrix[[0.0008076454646645536], [4.923257856798684], [-113.43169704202194]]
J: 2.02765281325063  Alpha: 0.01  Theta: Matrix[[-0.00014755931145485107], [3.0843409102315205], [-112.95644762679805]]
J: 1.821451342237053  Alpha: 0.01  Theta: Matrix[[-0.0008639634905593289], [1.6548476959031622], [-112.46627318829059]]
J: 1.8214513720879484  Alpha: 0.01  Theta: Matrix[[-0.0013117163263802246], [0.6758826956046561], [-111.9660989569473]]
J: 1.8214513720879484  Alpha: 0.001  Theta: Matrix[[-0.0013535066248876874], [0.5834935043210742], [-111.91600392423089]]
J: 1.7870844304014568  Alpha: 0.001  Theta: Matrix[[-0.0013952969233951501], [0.49110431303749225], [-111.86590889151448]]
J: 1.7870844304014568  Alpha: 0.001  Theta: Matrix[[-0.0014341021771264934], [0.40365238581361185], [-111.81578997843985]]
J: 1.7870844304014568  Alpha: 0.001  Theta: Matrix[[-0.0014729074308578367], [0.31620045858973145], [-111.76567106536523]]
J: 1.752717488714965  Alpha: 0.001  Theta: Matrix[[-0.0015115010626209136], [0.22904945780472585], [-111.71555130580272]]
J: 1.752717488714965  Alpha: 0.001  Theta: Matrix[[-0.001544336226800018], [0.15110191314800955], [-111.66540851236988]]
J: 1.770809597429665  Alpha: 0.001  Theta: Matrix[[-0.0015771713909791226], [0.07315436849129325], [-111.61526571893704]]
J: 1.7297985323807161  Alpha: 0.0001  Theta: Matrix[[-0.00158045491336022], [0.06535960382896211], [-111.61025143962061]]
J: 1.718350722631126  Alpha: 0.0001  Theta: Matrix[[-0.0015837319880072584], [0.05757622586497872], [-111.60523715385645]]
J: 1.7183505768797593  Alpha: 0.0001  Theta: Matrix[[-0.0015867170175074515], [0.05030859963032436], [-111.60022257604714]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0015897020324328638], [0.04304099913473299], [-111.59520799822326]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0015926870473582369], [0.03577339863921061], [-111.59019342039937]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.00159567206228361], [0.028505798143688237], [-111.58517884257549]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.001598657077208983], [0.02123819764816586], [-111.5801642647516]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.001601642092134356], [0.013970597152643486], [-111.57514968692772]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.001604627107059729], [0.006702996657121109], [-111.57013510910383]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016076121219851022], [-0.0005646038384012671], [-111.56512053127994]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016105971369104752], [-0.007832204333923645], [-111.56010595345606]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016135821518358483], [-0.01509980482944602], [-111.55509137563217]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016165671667612213], [-0.022367405324968396], [-111.55007679780829]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016195521816865944], [-0.02963500582049077], [-111.5450622199844]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016225371966119674], [-0.03690260631601315], [-111.54004764216052]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016255222115373405], [-0.04417020681153553], [-111.53503306433663]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016285072264627136], [-0.05143780730705791], [-111.53001848651274]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016314922443731613], [-0.05870541239661013], [-111.52500390868587]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016344772622834192], [-0.06597301748587016], [-111.519989330859]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016374622664495802], [-0.07324060142296517], [-111.51497475304588]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.001640217664533409], [-0.08015482159935092], [-111.50996040483884]]
J: 1.7183505768793688  Alpha: 0.0001  Theta: Matrix[[-0.0016455906875599943], [-0.0937712290880118], [-111.49993184619791]]
J: 1.994702022407994  Alpha: 0.0001  Theta: Matrix[[-0.0016482771980077554], [-0.10057943119248941], [-111.49491756687851]]
J: 1.9789198631246232  Alpha: 1.0e-05  Theta: Matrix[[-0.0016485458502465615], [-0.10126025363935508], [-111.49441613894419]]
J: 1.948354991984789  Alpha: 1.0e-05  Theta: Matrix[[-0.0016490831547241735], [-0.10262189853308641], [-111.49341328307554]]
J: 1.9331013621188657  Alpha: 1.0e-05  Theta: Matrix[[-0.0016493518069629796], [-0.10330272097995208], [-111.49291185514122]]
J: 1.9178620371528292  Alpha: 1.0e-05  Theta: Matrix[[-0.0016496204592017856], [-0.10398354342681772], [-111.49241042720689]]
J: 1.902623825636303  Alpha: 1.0e-05  Theta: Matrix[[-0.0016498891114405914], [-0.10466436587368326], [-111.49190899927257]]
J: 1.8873858680247269  Alpha: 1.0e-05  Theta: Matrix[[-0.0016501577636793972], [-0.10534518832054848], [-111.49140757133824]]
J: 1.8721478527437034  Alpha: 1.0e-05  Theta: Matrix[[-0.0016504264159182024], [-0.10602601076741257], [-111.49090614340392]]
J: 1.8569098083540256  Alpha: 1.0e-05  Theta: Matrix[[-0.0016506950681570054], [-0.10670683321427255], [-111.4904047154696]]
J: 1.8416717846532462  Alpha: 1.0e-05  Theta: Matrix[[-0.0016509637203958004], [-0.10738765566111781], [-111.48990328753527]]
J: 1.8264337702403803  Alpha: 1.0e-05  Theta: Matrix[[-0.0016512323726345674], [-0.10806847810791036], [-111.48940185960095]]
J: 1.8111957469624462  Alpha: 1.0e-05  Theta: Matrix[[-0.0016515010251717409], [-0.1087493010703349], [-111.48890043166602]]
J: 1.7959577228777213  Alpha: 1.0e-05  Theta: Matrix[[-0.001651769677708553], [-0.10943012403208266], [-111.4883990037311]]
J: 1.7807196990939538  Alpha: 1.0e-05  Theta: Matrix[[-0.0016520383302440706], [-0.11011094699140556], [-111.48789757579618]]
J: 1.7654816767669712  Alpha: 1.0e-05  Theta: Matrix[[-0.0016523069827749494], [-0.11079176994204029], [-111.48739614786128]]
J: 1.7197677244765115  Alpha: 1.0e-05  Theta: Matrix[[-0.0016531129399852717], [-0.11283423807786983], [-111.4858918640573]]
J: 1.7045300185036796  Alpha: 1.0e-05  Theta: Matrix[[-0.0016533815914621833], [-0.11351505905442376], [-111.48539043612449]]
J: 1.689293134633683  Alpha: 1.0e-05  Theta: Matrix[[-0.0016536502402002386], [-0.11419587490110002], [-111.48488900819716]]
J: 1.674059195452273  Alpha: 1.0e-05  Theta: Matrix[[-0.001653918879126327], [-0.1148766723699622], [-111.48438758028945]]
J: 1.6588357959146847  Alpha: 1.0e-05  Theta: Matrix[[-0.0016541874829120791], [-0.11555740402097447], [-111.48388615245203]]
J: 1.6436500186219352  Alpha: 1.0e-05  Theta: Matrix[[-0.0016544559609891405], [-0.1162379002196091], [-111.48338472486603]]
J: 1.6285972611659707  Alpha: 1.0e-05  Theta: Matrix[[-0.001654723991174496], [-0.11691755751707966], [-111.4828832981758]]
J: 1.6139994752963014  Alpha: 1.0e-05  Theta: Matrix[[-0.0016549904481917704], [-0.11759426827073645], [-111.48238187463193]]
J: 1.600799606845299  Alpha: 1.0e-05  Theta: Matrix[[-0.0016552516449943116], [-0.11826112664220582], [-111.48188046160847]]
J: 1.5908244528084288  Alpha: 1.0e-05  Theta: Matrix[[-0.0016554977759847996], [-0.1188997667477244], [-111.48137907871664]]
J: 1.5851960976828814  Alpha: 1.0e-05  Theta: Matrix[[-0.0016557144987826046], [-0.11948332530842007], [-111.4808777546412]]
J: 1.5826817076400923  Alpha: 1.0e-05  Theta: Matrix[[-0.0016558999497352893], [-0.12000831170339445], [-111.48037649310945]]
J: 1.5816354848004566  Alpha: 1.0e-05  Theta: Matrix[[-0.0016560658987327093], [-0.12049677093659837], [-111.4798752705816]]
J: 1.581199878569286  Alpha: 1.0e-05  Theta: Matrix[[-0.0016562224426970157], [-0.12096761454376066], [-111.47937406686383]]
J: 1.5810169018926878  Alpha: 1.0e-05  Theta: Matrix[[-0.0016563748211790893], [-0.12143065620486218], [-111.47887287147701]]
J: 1.5809396242131868  Alpha: 1.0e-05  Theta: Matrix[[-0.0016565254040880424], [-0.1218903347622732], [-111.47837167968135]]
J: 1.5809069017613124  Alpha: 1.0e-05  Theta: Matrix[[-0.0016566752202995195], [-0.12234857730581448], [-111.47787048941908]]
J: 1.5808930296490606  Alpha: 1.0e-05  Theta: Matrix[[-0.001656824710233385], [-0.12280620875454971], [-111.47736929980935]]
J: 1.580887145848097  Alpha: 1.0e-05  Theta: Matrix[[-0.0016569740612930289], [-0.12326358014294572], [-111.47686811047738]]
J: 1.580884649719601  Alpha: 1.0e-05  Theta: Matrix[[-0.0016571233527736234], [-0.12372084005243131], [-111.47636692126457]]
J: 1.5808835906710963  Alpha: 1.0e-05  Theta: Matrix[[-0.0016572726175860411], [-0.12417805026085695], [-111.47586573210509]]
J: 1.5808831413239819  Alpha: 1.0e-05  Theta: Matrix[[-0.00165742186803091], [-0.12463523410670607], [-111.47536454297435]]
.........

For the one that creates a proper prediction:

J: 4.330234652497978  Alpha: 1.0  Theta: Matrix[[0.12388059701492538], [211.9910447761194], [-111.13731343283582]]
J: 4.330234652497978  Alpha: 0.1  Theta: Matrix[[0.08626965671641812], [152.3222144059701], [-118.07202388059702]]
J: 4.2958677406623815  Alpha: 0.1  Theta: Matrix[[0.048658716417910856], [92.65338403582082], [-125.0067343283582]]
J: 3.333594209265678  Alpha: 0.1  Theta: Matrix[[0.011644779104478219], [33.61767533134318], [-131.44443979104477]]
J: 0.4467735852246924  Alpha: 0.1  Theta: Matrix[[-0.014623104477611202], [-11.126378913433022], [-132.24166105074627]]
J: 3.333594209265678  Alpha: 0.1  Theta: Matrix[[0.01194378805970217], [31.177094038805805], [-126.89243925671643]]
J: 3.0930257965656063  Alpha: 0.01  Theta: Matrix[[0.009436400895523079], [26.892626149850567], [-126.92472924]]
J: 2.7493567080605392  Alpha: 0.01  Theta: Matrix[[0.007257365074627634], [23.13644550388053], [-126.8386038647761]]
J: 2.508788325211366  Alpha: 0.01  Theta: Matrix[[0.005466380895523164], [19.99261048238799], [-126.62851089164178]]
J: 2.405687589704577  Alpha: 0.01  Theta: Matrix[[0.004152999104478391], [17.61296913194023], [-126.28907722179103]]
J: 2.268219942362192  Alpha: 0.01  Theta: Matrix[[0.002959017910448543], [15.415473392238736], [-125.92224111492536]]
J: 2.1307522353180164  Alpha: 0.01  Theta: Matrix[[0.002093389253732125], [13.751072827761122], [-125.48597339134326]]
J: 2.027651529662123  Alpha: 0.01  Theta: Matrix[[0.0014367116417918252], [12.436814710149182], [-125.00961691402983]]
J: 1.9589177059909308  Alpha: 0.01  Theta: Matrix[[0.0009889847761201823], [11.44908667850739], [-124.49911195194028]]
J: 1.8558169406332465  Alpha: 0.01  Theta: Matrix[[0.0006606582089560022], [10.652638055522315], [-123.97004023522386]]
J: 1.8214500586485458  Alpha: 0.01  Theta: Matrix[[0.0004218823880604789], [9.988664770447688], [-123.42914782925371]]
J: 1.8214500884994413  Alpha: 0.01  Theta: Matrix[[0.0002428068653197179], [9.416182220312082], [-122.88082274064425]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.00023086931308091184], [9.369775500013574], [-122.82513353589798]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.00021893176084210577], [9.323368779715066], [-122.7694443311517]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.0002069942086032997], [9.276962059416558], [-122.71375512640543]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.00019505665636449364], [9.23055533911805], [-122.65806592165916]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.00018311910412568757], [9.184148618819542], [-122.60237671691289]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.0001711815518868815], [9.137741898521034], [-122.54668751216661]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.00015924399964807544], [9.091335178222526], [-122.49099830742034]]
J: 1.8214500884994413  Alpha: 0.001  Theta: Matrix[[0.00014730641755852312], [9.04492840598372], [-122.43530910670393]]
J: 1.8677695240029366  Alpha: 0.001  Theta: Matrix[[0.0001353688354689708], [8.998521633744915], [-122.37961990598751]]
J: 1.8462563443835032  Alpha: 0.0001  Theta: Matrix[[0.0001341750742749415], [8.993880951437452], [-122.374050986289]]
J: 1.8247430163841476  Alpha: 0.0001  Theta: Matrix[[0.00013298131308164604], [8.98924026913124], [-122.3684820665904]]
J: 1.803243007740144  Alpha: 0.0001  Theta: Matrix[[0.0001317875528781551], [8.984599588510665], [-122.36291314676808]]
J: 1.7875423426167685  Alpha: 0.0001  Theta: Matrix[[0.00013059512176735966], [8.979961171334951], [-122.35734406080917]]
J: 1.7870839229503594  Alpha: 0.0001  Theta: Matrix[[0.0001296573060241053], [8.97575636413016], [-122.35174314792931]]
J: 1.7870831481868632  Alpha: 0.0001  Theta: Matrix[[0.00012876197468911015], [8.971623907872633], [-122.34613692449842]]
J: 1.7870831468153818  Alpha: 0.0001  Theta: Matrix[[0.00012786672082037553], [8.967491583540149], [-122.34053069138426]]
J: 1.7870831468129538  Alpha: 0.0001  Theta: Matrix[[0.000126971467088789], [8.963359259441226], [-122.33492445825294]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.0001260762133574453], [8.959226935342718], [-122.3293182251216]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00012518095962610202], [8.95509461124421], [-122.32371199199025]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00012428570589475874], [8.950962287145702], [-122.3181057588589]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00012339045216341546], [8.946829963047193], [-122.31249952572756]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00012249519843207218], [8.942697638948685], [-122.30689329259621]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00012159994470072888], [8.938565314850177], [-122.30128705946487]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00012070469096938559], [8.934432990751668], [-122.29568082633352]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.0001198094372380423], [8.93030066665316], [-122.29007459320218]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.000118914183506699], [8.926168342554652], [-122.28446836007083]]
J: 1.7870831468129498  Alpha: 0.0001  Theta: Matrix[[0.00011801892977535571], [8.922036018456144], [-122.27886212693949]]
......

EDIT: I noticed that the first iteration the hypothesis is always predicting 0.5, since the theta is all 0s. But then after that it always predicts 1 or 0 (0.00001 or 0.99999 to avoid logarithms that don't exist in my code). That doesn't seem right to me - way way too confident - and is probably the key to why this isn't working.

4
Consider posting this question at stats.stackexchange.com.Jan
If a moderator could move it there, I would be grateful.Andrew Latham
I have flagged it as off-topic, so it can be seen and moved.Andrew Latham
The only thing i can tell as a off-topic - that's really bad ruby code. Use iterators like each and other collection methods like map, inject, etcSigurd
My suggestion to help you find the problem yourself is to compare your output against an implementation that is expected to be correct. In this case, consider mnrfit (mathworks.com/help/stats/mnrfit.html) from Matlab.mmgp

4 Answers

3
votes

There are a couple things about your implementation that are a little non-standard.

  1. First, the logistic regression objective is typically given as a minimization problem of

    lr(x[n],y[n])=log(1+exp(-y[n]*dot(w[n],x[n]))) where y[n] is either 1 or -1

    You seem to be using the equivalent maximization problem formulation of

    lr(x[n],y[n])=-y[n]*log(1+exp(-dot(w[n],x[n])))+(1-y[n])*(-dot(w[n],x[n])-log(1+exp(-dot(w[n],x[n])))

    where y[n] is either 0 or 1 (y[n]=0 in this formulation is the equivalent of y[n]=1 in the first formulation).

    So, you should make sure that in your dataset, your labels are 0 or 1 and not 1 or -1.

  2. Next, the LR objective is typically not divided by m (the size of the dataset). This scaling factor is incorrect when you view logistic regression as a probabilistic model.

  3. Finally, you may be having some numerical issues with your implementation (which you tried to correct in the g function). Leon Bottou's sgd code (http://leon.bottou.org/projects/sgd) has some more-stable computations of the loss function and derivative as follows (in C code - he uses the first LR formulation that I mention):

    /* logloss(a,y) = log(1+exp(-a*y)) */
    double loss(double a, double y)
    {
      double z = a * y;
      if (z > 18) {
        return exp(-z);
      }
      if (z < -18) {
        return -z;
      }
      return log(1 + exp(-z));
    }
    
    /*  -dloss(a,y)/da */
    double dloss(double a, double y)
    {
      double z = a * y;
      if (z > 18) {
        return y * exp(-z);
      }
      if (z < -18){
        return y;
      }
      return y / (1 + exp(z));
    }
    

You should also consider running a stock l-bfgs routine (I am not familiar with Ruby implementations) so you can focus on getting the objective and gradient computations correct and not have to worry about things like learning rates.

1
votes

a few thoughts:

  • I think it would be helpful if you could show the values of some iterations of J() and alpha.
  • do you include a constant (bias) as a feature ? If I remember correctly, if you don't do this, your (straight) line of h() == 0.5 will be forced to go through zero

  • Your function J() looks like it's returning the negative log likelihood (which you therefore want to minimize). Yet you decrease the learning rate if (oldJ < newJ), i.e. if J() became larger, i.e. worse.

0
votes

Float numbers:

Try this? Equal comparison between floats does not makes great sense to me.

def g(z)
    tmp = 1.0 / (1.0 + Math.exp(-z))   # Sigmoid function
    if (tmp >= 0.99999) then tmp = 0.99999 end    # These two things are here because ln(0) DNE, so we don't want to do ln(1 - 1.0) or ln(0.0)
    if (tmp <= 0.00001) then tmp = 0.00001 end
    return tmp
end

Feature scaling

You mentioned you are using two features, I assume they are the player's own rating and the rating diff. Is that correct?

Also consider using some feature scaling as a data pre-processing step, e.g.,

enter image description here. or you can do the standardization method by making the values of each feature in the data have zero-mean and unit-variance.

Questions:

  • What is the difference between your blue line and green line in your graphs?
  • Did you try to start with a very small learning rate, e.g, 0.01 or 0.001?
  • What would be the behavior if you just use a fixed learning rate? Try 0.001, 0.01, 0.1, 0.5, 1, 10 etc. Please post your result here.
0
votes

Instead of playing with your learning rate i think you need to normalize you initial data set using feature normalization((X- mu)/sigma) then perform the operation that you intend to do.

Without feature normalization, gradient descent becomes buggy for large data set at behaves abnormaly.