32
votes

I am confused about the difference between apply_gradients and minimize of optimizer in tensorflow. For example,

optimizer = tf.train.AdamOptimizer(1e-3)
grads_and_vars = optimizer.compute_gradients(cnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

and

optimizer = tf.train.AdamOptimizer(1e-3)
train_op = optimizer.minimize(cnn.loss, global_step=global_step)

Are they the same indeed?

If I want to decay the learning rate, can I use the following codes?

global_step = tf.Variable(0, name="global_step", trainable=False)
starter_learning_rate = 1e-3
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                       100, FLAGS.decay_rate, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
    optimizer = tf.train.AdamOptimizer(learning_rate)
    grads_and_vars = optimizer.compute_gradients(cnn.loss)
    train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
)

Thanks for your help!

2

2 Answers

24
votes

You can easily know from the link : https://www.tensorflow.org/get_started/get_started (tf.train API part) that they actually do the same job. The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.

4
votes

here it says minimize uses tf.GradienTape and then apply_gradients:

Minimize loss by updating var_list.

This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.

So minimize actually uses apply_gradients just like:

def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None):
    grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
    return self.apply_gradients(grads_and_vars, name=name)

In your example, you use compute_gradients and apply_gradients, this is indeed valid but nowadays, compute_gradients was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.