Ok, so what does this algorithm exactly mean?
What I know :
i) alpha : how big the step for gradient descent will be.
ii) Now , ∑{ hTheta[x(i)] - y(i) } : refers to Total Error with given values of Theta.
The error refers to the difference between predicted value{ hTheta[x(i)] } and the actual value.[ y(i) ]
∑{ hTheta[x(i)] - y(i) } gives us the summation of all errors from all training examples.
What does Xj^(i) at the end stand for?
Are we doing the following while implementing Gradient Descent for multiple variable Linear Regression?
Theta (j) minus:
alpha
times 1/m
times:
{ error of first training example multiplied by jth element of first training example. PLUS
error of second training example mutiplied by jth element of second training example. PLUS
.
.
.
PLUS error of nth training example multiplied by jth element of nth training example. }
j
is your index number for the parameters. x^{i} is input features of ith training example. x_{j}^{i} is feature j in ith training example. – ARAT