Gradient descent Matlab to Python

Question

I'm halfway through my logistic regression model program and I'm stuck on gradient descent function. I'm translating from Matlab to Python. This is the code for it I got in Matlab (working):

function [w func_values] = gradient_descent( obj_fun, w0, epochs, eta )
% Function optimizes obj_fun using gradient descent method. 
% Returns variable w, which minimizes objective function, and values of the
% objective function in all optimization steps (func_values)

% obj_fun - pointer to objective function, that is callable by: obj_fun(w)
% w0 - starting point (initial parameter values)
% epochs - number of epochs / number of algorithm iterations
% eta - learning rate

func_values = [];
w = zeros(size(w0));
w = w0;
for i=1:epochs
    [~, grad] = obj_fun(w);
    w = w-(eta*grad);
    [L, ~] = obj_fun(w);
    func_values(i, 1) = L;
end
end

And this is my translation to Python:

def gradient_descent(obj_fun, w0, epochs, eta):
    '''
    :param obj_fun: objective function that is going to be minimized (call val,grad = obj_fun(w)).
    :param w0: starting point Mx1
    :param epochs: number of epochs / iterations of an algorithm
    :param eta: learning rate
    :return: function optimizes obj_fun using gradient descent. It returns (w,func_values),
    where w is optimal value of w and func_valus is vector of values of objective function [epochs x 1] calculated for each epoch
    '''
    w = w0
    func_values = []
    for i in range (epochs):
        [val, grad] = obj_fun(w)
        w = w -(eta*grad)
        [val,grad] = obj_fun(w)
        val = func_values[i]
    return (w, func_values)

Apparently, I got the value of w correct, the func_values aren't fine though. I think the problem is in the [~, grad] etc. translation. I've been looking for it but haven't found anything on it yet; how could I properly translate ~ from Matlab to Python? I assume it's skipping rows and assigning columns only to grad, and then, in [L, ~] it's assigning rows to L and skipping columns. If I'm wrong, let me know! The error I get is:

    val = func_values[i]
IndexError: list index out of range

Why is it out of range? There was no range assigned to func_values yet. I've tried [i, 0] as well, producing this error:

    val = func_values[i, 1]
TypeError: list indices must be integers or slices, not tuple

Any ideas?

I'm afraid I don't remember Matlab to help with the translation overall, but you have a problem in Python itself. func_values is an empty list that you are trying to index (val = func_values[i] is the syntax for "get the value at the i'th index and assign it to val", not setting a value); that will fail. You possibly want something like func_values.append([val, grad]) but I can't understand why you are using the i index at all in your loop. — roganjosh
Well, in the Matlab solution I used i to fill the func_values with val a 100 times (100 was the number of epochs). After using your solution, I get an error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). I think it might have something to do with the fact that I'm just filling it once. Honestly, the entire func_values is confusing to me. Thank you so much for help though! — user7220560
Then this really is a case of "lost in translation" because nothing in your code adds anything to func_values at all; if somehow it didn't error, it would be unchanged at the end (still empty). You almost certainly want append somewhere to actually add values to the list, and possibly use the i index to retrieve those values from obj_fun for specific epochs. Actually, you seem to be using python almost in complete reverse. Can you give a short sample of what print(obj_fun(w)) gives? That reduces the dependency of people knowing both languages to answer. — roganjosh
Unfortunately I can't, because I'm only implementing functions and I can't access the main or test files, which I got provided for the assignment, and the only print I get is the one already written in main (so when I write print, I don't see the output anyway). I did manage to get that func_values should result in an array with just one int per row, so I suppose it's only gonna be val. Sorry if this isn't helping... I'm really trying to get this one figured out. I think func_values.append([val]) should make it work, but it doesn't. — user7220560

Krupip Krupip · Accepted Answer · 2017-05-02T17:22:33

You have not adequately understood non matlab languages enough to know what you are doing. You need to learn more about python before you attempt this problem, you make many mistakes that wouldn't be made if you understood the language well enough. To help I'll go through my parsing of your code and what is wrong with your python. Given that you know matlab, you should be able to merely go through this official python tutorial to understand all the constructs of the python language to a degree in which you could do this conversion properly. Going through this tutorial should only take a weekend, if that, and will teach you all the basics of the language. Its made for people who already know a language already.

function [w func_values] = gradient_descent( obj_fun, w0, epochs, eta )
% Function optimizes obj_fun using gradient descent method. 
% Returns variable w, which minimizes objective function, and values of the
% objective function in all optimization steps (func_values)

% obj_fun - pointer to objective function, that is callable by: obj_fun(w)
% w0 - starting point (initial parameter values)
% epochs - number of epochs / number of algorithm iterations
% eta - learning rate

func_values = [];
w = zeros(size(w0));
w = w0;
for i=1:epochs
    [~, grad] = obj_fun(w);
    w = w-(eta*grad);
    [L, ~] = obj_fun(w);
    func_values(i, 1) = L;
end
end

To go over what has happened here, you have made a matlab function, called gradient_descent which takes in four parameters and returns two values, w and func_values.

%creating an empty matrix
func_values = [];
% creating a matrix of zeros whos size is taken from w0's size, which we 
% don't know and could make this two dimensional or one dimensional or three 
% dimensional, there is max size for dimensionalilty in matlab. 
w = zeros(size(w0));
% matlab assignment is copy by value unless you inherit from handle, so you 
% are copying by value of the w0, and you don't even need the previous line, 
% but I digress.  
w = w0;
% you appear to be using the wrong nomenclature for w0, which is usually the 
% input node vector to a nueral network, youve used it as the 
% initial state of the wieghts in the graph.  

%for i = 1, 2, 3,... i = epochs
for i=1:epochs
    % get the gradient difference between expected output objective 
    % function, and the output of w. Assume objective returns gradients at 
    % each point in the graph, with sizeof(grad) == sizeof(w)
    [~, grad] = obj_fun(w);
    % multiply the gradient and learning rate and subtract both matricies 
    w = w-(eta*grad);
    % getting the real value of objective function when applied to graph?
    [L, ~] = obj_fun(w);
    % adding to list this objective function value. 
    func_values(i, 1) = L;
end
end

Now onto the python, you've defined a function gradient descent that takes four parameters, and returns w and func_values.

def gradient_descent(obj_fun, w0, epochs, eta):
    '''
    :param obj_fun: objective function that is going to be minimized (call val,grad = obj_fun(w)).
    :param w0: starting point Mx1
    :param epochs: number of epochs / iterations of an algorithm
    :param eta: learning rate
    :return: function optimizes obj_fun using gradient descent. It returns (w,func_values),
    where w is optimal value of w and func_valus is vector of values of objective function [epochs x 1] calculated for each epoch
    '''
    # assigned by object reference from the get go... so you will be 
    # modifying w0 if you modify w. Bad, you need a deepcopy
    w = w0
    # intializing empty list of func values
    func_values = []
    # for i = 0, 1, 2, 3... i = epochs-1.  Notice how i starts at 0 and ends 
    # at max-1?
    for i in range (epochs):
        # this isn't matlab, you can just do val,grad instead here. 
        # otherwise does what it did in matlab. 

        [val, grad] = obj_fun(w)
        # you can't subtract two lists, and multiplying lists results in the 
        # number of elements being repeated n times, not multiplying each 
        # element by n. This should give you an error. 
        w = w -(eta*grad)
        # you can also use _ to represent ~ in matlab
        [val,grad] = obj_fun(w)
        # what... doing backward order assignment? func_values doesn't even 
        # have anything in it either, it is just = [].
        # you can't do func_values[i] = val either, since the memory at 
        # position i doesn't exist.  You need to do func_values.append(val) 
        # this should give you an error.   
        val = func_values[i] 
    # python automatically converts this into a tuple, no need for the 
    # parenthesis. explicit return in python. 
    return (w, func_values)

You need to go over python's rules badly. You can't just copy and paste matlab and expect it to work. Fixing the issues I raised above will fix your program assuming your objective function, w0, ecpochs and eta are all proper. You'll need to implement the array based subtraction and element wise multiplication yourself unless you use Numpy.

You'll need to do something like this:

def gradient_descent(obj_fun, w0, epochs, eta):
    # shallow copy of w0, works for primitives like numbers, won't work for 
    # other things like multidimensional arrays since that is a list of 
    # lists
    w = w0[:]
    # to deep copy, you need to import copy first, then do 
    #    w = copy.deecopy(w0)
    func_values = []
    for i in range (epochs):
        val, grad = obj_fun(w)
        # list comprehension generates a list, here I'm multiplying eta for 
        # each element in grad, _, and that becomes a list.
        descent_list = [eta*_ for _ in grad]
        # you'll need to create your own subtraction function to subtract 
        # each element, you'll need to iterate through each list and 
        # subtract the values from eachother and return a list of those 
        # values
        w = subtract(w, descent_list)
        val,grad = obj_fun(w)
        func_values.append(val)
    return w, func_values

Gradient descent Matlab to Python

2 Answers