Does the example for decaying the learning rate in TensorFlow website actually decay the learning rate?

Question

I was reading the decaying learning rate and thought there might be a mistake in the docs and wanted to confirm. It says that the decay equation is:

decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

however, if global_step = 0 I'd guess there is never a decay, right? However, look at the example:

...
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                           100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
    tf.GradientDescentOptimizer(learning_rate)
    .minimize(...my loss..., global_step=global_step)
)

It has a global_step = tf.Variable(0, trainable=False) that is set equal to zero. Thus, no decay. Is this correct deduction?

I thought there might be a caveat due to integer division when staircase function is set to true, but even in integer division that still seems that there is no decay. Or is there a misunderstanding of what staircase does?

is it just the number of steps that have passed and its just started at zero? :/ — Charlie Parker

Olivier Moindrot Olivier Moindrot · Accepted Answer · 2016-06-27T06:15:27

The variable global_step is passed to the minimize function and will be incremented each time the training operation learning_step is run.

It is even written in the commentary of your code:

# Passing global_step to minimize() will increment it at each step.

Does the example for decaying the learning rate in TensorFlow website actually decay the learning rate?

2 Answers