7
votes

I'm doing Andrew Ng's course on Machine Learning and I'm trying to wrap my head around the vectorised implementation of gradient descent for multiple variables which is an optional exercise in the course.

This is the algorithm in question (taken from here):

enter image description here

I just cannot do this in octave using sum though, I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i). I tried different iterations of the following code but to no avail (either the dimensions are not right or the answer is wrong):

theta = theta - alpha/m * sum(X * theta - y) * X;

The correct answer, however, is entirely non-obvious (to a linear algebra beginner like me anyway, from here):

theta = theta - (alpha/m *  (X * theta-y)' * X)';

Is there a rule of thumb for cases where sum is involved that governs transformations like the above?

And if so, is there the opposite version of the above (i.e. going from a sum based solution to a general multiplication one) as I was able to come up with a correct implementation using sum for gradient descent for a single variable (albeit not a very elegant one):

temp0 = theta(1) - (alpha/m * sum(X * theta - y));
temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2)));

theta(1) = temp0;
theta(2) = temp1;

Please note that this only concerns vectorised implementations and although there are several questions on SO as to how this is done, my question is primarily concerned with the implementation of the algorithm in Octave using sum.

4
Look at the duplicate link - specifically the second approach with sum.rayryeng

4 Answers

4
votes

The general "rule of the thumb" is as follows, if you encounter something in the form of

SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)

then you can easily vectorize it (and this is what is done in the above) through

f(x, y, ...)' * g(a, b, ...)

As this is just a typical dot product, which in mathematics (in Euclidean space of finite dimension) looks like

<A, B> = SUM_i A_i B_i = A'B

thus

(X * theta-y)' * X)

is just

<X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i

as you can see this works both ways, as this is just a mathematical definition of dot product.

1
votes

Referring to this part of your question specifically - "I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i)."

In Octave you can multiply xj(i) to all the predictions using ".", so it can be written as:

m = size(X, 1);
predictions = X * theta;
sqrErrors = (predictions-y).^2;
J = 1 / (2*m) * sum(sqrErrors);
1
votes

The vector multiplication automatically includes calculating the sum of the products. So you don't have to specify the sum() function. By using the sum() function, you are converting a vector into a scalar and that's bad.

0
votes

You actually don't want to use summation here, because what you try to calculate are the single values for all thetas, and not the overall cost J. As you do this in one line of code, if you sum it up you end up with a single value (the sum of all thetas). Summation was correct, though unnecessary, when you computed the values of theta one by one in the previous exercise. This works just the same:

temp0 = theta(1) - (alpha/m * (X * theta - y)' * X(:, 1));
temp1 = theta(2) - (alpha/m * (X * theta - y)' * X(:, 2));

theta(1) = temp0;
theta(2) = temp1;