1
votes

I was reading the decaying learning rate and thought there might be a mistake in the docs and wanted to confirm. It says that the decay equation is:

decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

however, if global_step = 0 I'd guess there is never a decay, right? However, look at the example:

...
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                           100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
    tf.GradientDescentOptimizer(learning_rate)
    .minimize(...my loss..., global_step=global_step)
)

It has a global_step = tf.Variable(0, trainable=False) that is set equal to zero. Thus, no decay. Is this correct deduction?

I thought there might be a caveat due to integer division when staircase function is set to true, but even in integer division that still seems that there is no decay. Or is there a misunderstanding of what staircase does?

2
is it just the number of steps that have passed and its just started at zero? :/Charlie Parker

2 Answers

3
votes

The variable global_step is passed to the minimize function and will be incremented each time the training operation learning_step is run.

It is even written in the commentary of your code:

# Passing global_step to minimize() will increment it at each step.

0
votes

In addition to Olivier's answer, the global step is incremented also in apply_gradients (which is one of the steps in minimize).

If global_step was not None, that operation also increments global_step

So no matter how you do optimization (with just minimize or modifying the gradients), the global step is incremented.