I was reading the decaying learning rate and thought there might be a mistake in the docs and wanted to confirm. It says that the decay equation is:
decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
however, if global_step = 0
I'd guess there is never a decay, right? However, look at the example:
...
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
tf.GradientDescentOptimizer(learning_rate)
.minimize(...my loss..., global_step=global_step)
)
It has a global_step = tf.Variable(0, trainable=False)
that is set equal to zero. Thus, no decay. Is this correct deduction?
I thought there might be a caveat due to integer division when staircase function is set to true, but even in integer division that still seems that there is no decay. Or is there a misunderstanding of what staircase does?