Vectorizing Gradient Descent Function

Question

I'm fairly new to linear algebra and I'm currently taking Andrew Ng's machine learning course. I'm struggling to understand how the two below functions are the same. I'm working on vectorizing gradient descent for linear/logistic regression.

theta = theta - (alpha/m)*(X')*(X*theta - y)

theta = theta - (alpha/m)*sum((X*theta -y)*x(i))

My thought is that x(i) is a vector and in order to do vector multiplication, I need to transpose it, but when trying to mock up an example I didn't see how that was necessary. Any help or explanation would be greatly appreciated.

There’s an operator missing in front of x(i). Please edit your question to make sure all code is correct. — Cris Luengo
"Andrew Ng's machine learning course" doesn't provide as much information towards being able to answer this question as you might think it does (and no we're not going to go take the course just to figure out what alpha, m, theta, y, x, and X are in this particular case). Restate your question by stating what the variables are exactly, and far more importantly, their dimensions, and chances are the answer will pop up by itself. — Tasos Papastylianou

Tasos Papastylianou Tasos Papastylianou · Accepted Answer · 2019-11-28T11:03:34

Assuming you are referring to the equation at the bottom, Lecture 4, slide 8, then you have the sum wrong. The term x⁽ⁱ⁾ is meant to be inside the sum, not outside. And in the 'vectorised' case, your input X contains all individual observations x⁽ⁱ⁾ as (presumably) a column vector (check with your code to be sure). Therefore the correct equivalent lower expression should be:

theta = theta - (alpha/m)*sum((X*theta -y) .* X)

which is indeed equivalent to the other vectorised expression at the top, since in general, for any two (column) vectors a and b, it is true that a.' * b is equivalent to sum(a .* b)

Vectorizing Gradient Descent Function

1 Answers