I think if someone knows the difference, it helps to know when and where to use each method. I try to shed some lights on the concepts.
Gradient Descent is a type of first order optimization methods, and has been
used in the training of Neural Networks, since second order methods, such as
Newton's method, are computationally infeasible. However, second order methods show much better convergence characteristics than first order methods, because they also take into account the curvature of the error space.
Additionally,
first order methods require a lot of tuning of the decrease parameter, which is
application specific. They also have a tendency to get trapped in local optimum
and exhibit slow convergence.
The reason for in-feasibility of Newton's method is the computation of the
Hessian matrix, which takes prohibitively long. In order to overcome this issue, "Hessian free" learning is proposed in which one can use Newton's method without directly computing the Hessian matrix.
I don't wanna go into more details, but as far as I know, for deep network, it is highly recommended to use HF optimization (there are many improvement over HF approach as well) since it takes much less time for training, or using SGD with momentum.