I'm doing Andrew Ng's course on Machine Learning and I'm trying to wrap my head around the vectorised implementation of gradient descent for multiple variables which is an optional exercise in the course.
This is the algorithm in question (taken from here):
I just cannot do this in octave using sum
though, I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i). I tried different iterations of the following code but to no avail (either the dimensions are not right or the answer is wrong):
theta = theta - alpha/m * sum(X * theta - y) * X;
The correct answer, however, is entirely non-obvious (to a linear algebra beginner like me anyway, from here):
theta = theta - (alpha/m * (X * theta-y)' * X)';
Is there a rule of thumb for cases where sum
is involved that governs transformations like the above?
And if so, is there the opposite version of the above (i.e. going from a sum
based solution to a general multiplication one) as I was able to come up with a correct implementation using sum
for gradient descent for a single variable (albeit not a very elegant one):
temp0 = theta(1) - (alpha/m * sum(X * theta - y));
temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2)));
theta(1) = temp0;
theta(2) = temp1;
Please note that this only concerns vectorised implementations and although there are several questions on SO as to how this is done, my question is primarily concerned with the implementation of the algorithm in Octave using sum
.
sum
. – rayryeng