In neural nets, regularization (e.g. L2, dropout) is commonly used to reduce overfitting. For example, the plot below shows typical loss vs epoch, with and without dropout. Solid lines = Train, dashed = Validation, blue = baseline (no dropout), orange = with dropout. Plot courtesy of Tensorflow tutorials. Weight regularization behaves similarly.
Regularization delays the epoch at which validation loss starts to increase, but regularization apparently does not decrease the minimum value of validation loss (at least in my models and the tutorial from which the above plot is taken).
If we use early stopping to stop training when validation loss is minimum (to avoid overfitting) and if regularization is only delaying the minimum validation loss point (vs. decreasing the minimum validation loss value) then it seems that regularization does not result in a network with greater generalization but rather just slows down training.
How can regularization be used to reduce the minimum validation loss (to improve model generalization) as opposed to just delaying it? If regularization is only delaying minimum validation loss and not reducing it, then why use it?