The gradient descent is an iterative approach for minimizing the given function. We start with an initial guess of the solution and we take the gradient of the function at that point. We step the solution in the negative direction of the gradient and we repeat the process. The algorithm will eventually converge where the gradient is zero (which correspond to a local minimum). So your job is to find out the value of theta0 and theta1 that minimization the loss function [for example least squared error].
The term "converges" means you reached in the local minimum and further iteration does not affect the value of parameters i.e. value of theta0 and theta1 remains constant. Lets see an example Note: Assume it is in first quadrant for this explanation.
Lets say you have to minimize a function f(x) [cost function in your case]. For this you need to find out the value of x that minimizes the functional value of f(x). Here is the step by step procedure to find out the value of x using gradient descent method
- You choose the initial value of x. Lets say it is in point A in the figure.
- You calculate the gradient of f(x) with respect to x at A.
- This gives the slope of the function at point A. Since it the function is increasing at A, it will yield a positive value.
- You subtract this positive value from initial guess of x and update the value of x. i.e.
x = x - [Some positive value]
. This brings the x more closer to the D [i.e. the minimum] and reduces the functional value of f(x) [from figure]. Lets say after iteration 1, you reach to the point B.
- At point B, you repeat the same process as mention in step 4 and reach the point C, and finally point D.
- At point D, since it is local minimum, when you calculate gradient, you get 0 [or very close to 0]. Now you try to update value of x i.e.
x = x - [0]
. You will get same x [or very closer value to the previous x]. This condition is known as "Convergence".
The above steps are for increasing slope but are equally valid for decreasing slope. For example, the gradient at point G results into some negative value. When you update x i.e x = x - [ negative value] = x - [ - some positive value] = x + some positive value
. This increases the value of x and it brings x close to the point F [ or close to the minimum].
There are various approaches to solve this gradient descent. As @mattnedrich said, the two basic approaches are
Use fixed number of iteration N, for this pseudo code will be
iter = 0
while (iter < N) {
theta0 = theta0 - gradient with respect to theta0
theta1 = theta1 - gradient with respect to theta1
iter++
}
- Repeat until two consecutive values of theta0 and theta1 are almost identical. The pseudo code is given by @Gerwin in another answer.
Gradient descent is one of the approach to minimize the function in Linear regression. There exists direct solution too. Batch processing (also called normal equation) can be used to find out the values of theta0 and theta1 in a single step. If X is the input matrix, y is the output vector and theta be the parameters you want to calculate, then for squared error approach, you can find the value of theta in a single step using this matrix equation
theta = inverse(transpose (X)*X)*transpose(X)*y
But as this contains matrix computation, obviously it is more computationally expensive operation then gradient descent when size of the matrix X is large.
I hope this may answer your query. If not then let me know.