In Neural Networks, Gradient Descent looks over the entire training set in order to calculate gradient. The cost function decreases over iterations. If cost function increases, it is usually because of errors or inappropriate learning rate.
Conversely, Stochastic Gradient Descent calculates gradient over each single training example. I'm wondering if it is possible that the cost function may increase from one sample to another, even though the implementation is correct and parameters are well tuned. I get a feeling that exceptional increments of the cost function are okay since gradient follows minimization of a single sample, that may no be the same direction of convergence of the overall system.
Are increments of the cost function expected in Stochastic Gradient Descent?