I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and the newton's method?
From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?