3
votes

I’m trying to optimize a function using one of the algorithms that require a gradient. Basically I’m trying to learn how to optimize a function using a gradient in Julia. I’m fairly confident that my gradient is specified correctly. I know this because the similarly defined Matlab function for the gradient gives me the same values as in Julia for some test values of the arguments. Also, the Matlab version using fminunc with the gradient seems to optimize the function fine.

However when I run the Julia script, I seem to get the following error:

julia> include("ex2b.jl")
ERROR: `g!` has no method matching g!(::Array{Float64,1}, ::Array{Float64,1})
while loading ...\ex2b.jl, in ex
pression starting on line 64

I'm running Julia 0.3.2 on a windows 7 32bit machine. Here is the code (basically a translation of some Matlab to Julia):

using Optim
function mapFeature(X1, X2)
    degrees = 5
    out = ones(size(X1)[1])
    for i in range(1, degrees+1)
        for j in range(0, i+1)
            term  = reshape( (X1.^(i-j) .* X2.^(j)), size(X1.^(i-j))[1], 1)
            out   = hcat(out, term)
        end
    end
    return out
end

function sigmoid(z)
    return 1 ./ (1 + exp(-z))
end

function costFunc_logistic(theta, X, y, lam)
    m = length(y)
    regularization = sum(theta[2:end].^2) * lam / (2 * m)
    return sum( (-y .* log(sigmoid(X * theta)) - (1 - y) .* log(1 - sigmoid(X * theta))) ) ./ m + regularization
end

function costFunc_logistic_gradient!(theta, X, y, lam, m)
    grad= X' * ( sigmoid(X * theta) .- y ) ./ m
    grad[2:end] = grad[2:end] + theta[2:end] .* lam / m
    return grad
end

data = readcsv("ex2data2.txt")
X = mapFeature(data[:,1], data[:,2])
m, n = size(data)
y = data[:, end]
theta = zeros(size(X)[2])
lam = 1.0

f(theta::Array) = costFunc_logistic(theta, X, y, lam)
g!(theta::Array) = costFunc_logistic_gradient!(theta, X, y, lam, m)
optimize(f, g!, theta, method = :l_bfgs)

And here is some of the data:

0.051267,0.69956,1
-0.092742,0.68494,1
-0.21371,0.69225,1
-0.375,0.50219,1
-0.51325,0.46564,1
-0.52477,0.2098,1
-0.39804,0.034357,1
-0.30588,-0.19225,1
0.016705,-0.40424,1
0.13191,-0.51389,1
0.38537,-0.56506,1
0.52938,-0.5212,1
0.63882,-0.24342,1
0.73675,-0.18494,1
0.54666,0.48757,1
0.322,0.5826,1
0.16647,0.53874,1
-0.046659,0.81652,1
-0.17339,0.69956,1
-0.47869,0.63377,1
-0.60541,0.59722,1
-0.62846,0.33406,1
-0.59389,0.005117,1
-0.42108,-0.27266,1
-0.11578,-0.39693,1
0.20104,-0.60161,1
0.46601,-0.53582,1
0.67339,-0.53582,1
-0.13882,0.54605,1
-0.29435,0.77997,1
-0.26555,0.96272,1
-0.16187,0.8019,1
-0.17339,0.64839,1
-0.28283,0.47295,1
-0.36348,0.31213,1
-0.30012,0.027047,1
-0.23675,-0.21418,1
-0.06394,-0.18494,1
0.062788,-0.16301,1
0.22984,-0.41155,1
0.2932,-0.2288,1
0.48329,-0.18494,1
0.64459,-0.14108,1
0.46025,0.012427,1
0.6273,0.15863,1
0.57546,0.26827,1
0.72523,0.44371,1
0.22408,0.52412,1
0.44297,0.67032,1
0.322,0.69225,1
0.13767,0.57529,1
-0.0063364,0.39985,1
-0.092742,0.55336,1
-0.20795,0.35599,1
-0.20795,0.17325,1
-0.43836,0.21711,1
-0.21947,-0.016813,1
-0.13882,-0.27266,1
0.18376,0.93348,0
0.22408,0.77997,0

Let me know if you guys need additional details. Btw, this relates to a coursera machine learning course if curious.

2
It seems odd to me that g! is defined on an array, but then is called with two arrays as input. Could that be part of the issue?cd98

2 Answers

3
votes

The gradient should not be a function to compute the gradient, but a function to store it (hence the exclamation mark in the function name, and the second argument in the error message).

The following seems to work.

function g!(theta::Array, storage::Array) 
  storage[:] = costFunc_logistic_gradient!(theta, X, y, lam, m)
end
optimize(f, g!, theta, method = :l_bfgs)
0
votes

The same using closures and currying (version for those who got used to a function that returns the cost and gradient):

function cost_gradient(θ, X, y, λ)
    m = length(y);
    return (θ::Array) -> begin 
        h = sigmoid(X * θ); #(m,n+1)*(n+1,1) -> (m,1)
        J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2);        
    end, (θ::Array, storage::Array) -> begin  
        h = sigmoid(X * θ); #(m,n+1)*(n+1,1) -> (m,1)
        storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]];       
    end
end

Then, somewhere in the code:

initialθ = zeros(n,1);
f, g! = cost_gradient(initialθ, X, y, λ);
res = optimize(f, g!, initialθ, method = :cg, iterations = your_iterations);
θ = res.minimum;