I've been doing the homework 1 in Andrew Ng's machine learning course. But I'm stuck on my understanding of what he was talking about when vectorizing the multivariable gradient descent.
his equation is presented as follows: theta := theta - alpha*f
f is supposed to be created by 1/m*sum(h(xi)-yi)*Xi where i is the index
now here is where I get confused, I know that h(xi)-y(i) can be rewritten as theta*xi where xi represents a row of feature elements (1xn) and theta represents a column (nx1) producing a scalar which I then subtract from an individual value of y, which I then multiply by Xi where Xi represents a column of 1 features values?
so that would give me mx1 vector? which then has to be subtracted from an nx1 vector?
is it that Xi represents a row of feature values? and if so how can I do this without indexing over all of these rows?