Stochastic Gradient Descent increases Cost Function

Question

In Neural Networks, Gradient Descent looks over the entire training set in order to calculate gradient. The cost function decreases over iterations. If cost function increases, it is usually because of errors or inappropriate learning rate.

Conversely, Stochastic Gradient Descent calculates gradient over each single training example. I'm wondering if it is possible that the cost function may increase from one sample to another, even though the implementation is correct and parameters are well tuned. I get a feeling that exceptional increments of the cost function are okay since gradient follows minimization of a single sample, that may no be the same direction of convergence of the overall system.

Are increments of the cost function expected in Stochastic Gradient Descent?

I think many people nowadays call it Stochastic Gradient (as it's not a strict descent method). — sascha

Rodrigo Loza Rodrigo Loza · Accepted Answer · 2018-05-16T03:00:45

in theory we are taught that gradient descent decreases over time if the model is not overfitting or underfitting. Nonetheless, in practice that is not completely true. In a more real-world optimization problem you will note that the cost function is actually very noisy. It will have a lot of peaks and seeing the actual decreasing trend becomes hard. In order to see the trend, you have to compute a moving average so the signal gets cleaner and you see whether the cost function is decreasing or increasing. Hope this helps.

Stochastic Gradient Descent increases Cost Function

3 Answers