1
votes

Does anyone has experience with mixed-precision training using the tensorflow estimator api?

I tried casting my inputs to tf.float16 and the results of the network back to tf.float32. For scaling the loss I used tf.contrib.mixed_precision.LossScaleOptimizer.

The error messages I get are relatively uninformative: "Tried to convert 'x' to a tensor and failed. Error: None values not supported",

1
The error message seems very informative. x contains None values, which are not supported in the conversion to a tensor. Either x is erroneously getting Nones as values (in which case you need to locate the error producing this), or this is expected but you need to sanitise the data before converting it to a tensor. - iacob
It is clear for me what the specific error means, but not why it occurs and how to avoid it. When removing the tf.contrib.mixed_precision.LossScaleOptimizer decorator. Everything works fine except I get an underflow due to casting to tf.float16 after a few iterations. After removing the casts training works as usual. - Thomas

1 Answers

0
votes

I found the issue: I used tf.get_variable to store the learning rate. This variable has no gradient. Normal optimizers do not care, but tf.contrib.mixed_precision.LossScaleOptimizer crashes. Therefore, make sure these variables are not added to tf.GraphKeys.TRAINABLE_VARIABLES.