1
votes

I've been doing the homework 1 in Andrew Ng's machine learning course. But I'm stuck on my understanding of what he was talking about when vectorizing the multivariable gradient descent.

his equation is presented as follows: theta := theta - alpha*f

f is supposed to be created by 1/m*sum(h(xi)-yi)*Xi where i is the index

now here is where I get confused, I know that h(xi)-y(i) can be rewritten as theta*xi where xi represents a row of feature elements (1xn) and theta represents a column (nx1) producing a scalar which I then subtract from an individual value of y, which I then multiply by Xi where Xi represents a column of 1 features values?

so that would give me mx1 vector? which then has to be subtracted from an nx1 vector?

is it that Xi represents a row of feature values? and if so how can I do this without indexing over all of these rows?

I'm specifically referring to this image: enter image description here

1

1 Answers

0
votes

I'll explain it with the non-vectorized implementation

so that would give me mx1 vector? which then has to be subtracted from an nx1 vector?

yes it will give you m x 1 vector, but instead to be subtracted from n x 1 vector, it has to be subtracted from m x 1 vector too. How?

I know that h(xi)-y(i) can be rewritten as theta*xi where xi represents a row of feature elements (1xn) and theta represents a column (nx1) producing a scalar

you have answer it actually, theta * xi produce a scalar, so when you have m samples it will give you a m x 1 vector. If you see carefully in the equation the scalar result from h(xi) - y(i) is multiplied with a scalar too which is x0 from sample i (x sup i sub 0) so it will give you a scalar result, or m x 1 vector if you have m samples.