Difference between `apply_gradients` and `minimize` of optimizer in tensorflow

Question

I am confused about the difference between apply_gradients and minimize of optimizer in tensorflow. For example,

optimizer = tf.train.AdamOptimizer(1e-3)
grads_and_vars = optimizer.compute_gradients(cnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

and

optimizer = tf.train.AdamOptimizer(1e-3)
train_op = optimizer.minimize(cnn.loss, global_step=global_step)

Are they the same indeed?

If I want to decay the learning rate, can I use the following codes?

global_step = tf.Variable(0, name="global_step", trainable=False)
starter_learning_rate = 1e-3
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                       100, FLAGS.decay_rate, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
    optimizer = tf.train.AdamOptimizer(learning_rate)
    grads_and_vars = optimizer.compute_gradients(cnn.loss)
    train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
)

Thanks for your help!

Yanghoon Yanghoon · Accepted Answer · 2017-08-03T04:20:01

You can easily know from the link : https://www.tensorflow.org/get_started/get_started (tf.train API part) that they actually do the same job. The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.

Difference between `apply_gradients` and `minimize` of optimizer in tensorflow

2 Answers