1
votes

In my current project I'm using TF Hub image module along with estimator for a classification problem. As per TF Hub guidelines, I set the tags to "train" in training mode - and to None during Eval/Predict modes. Test loss/accuracy was so bad but training loss kept decreasing. After debugging for days I learnt that somehow the hub's trained model weights were not being used (seemed only the last dense layer outside hub was being reused).

To confirm where the problem is I did not pass "train" tags even for training (with no other changes) - and the problem was immediately resolved.

Grateful for all the help - many thanks!

#inside model_fn
tags_val = None

if is_training:
    tags_val = {"train"}

is_training = (mode == tf.estimator.ModeKeys.TRAIN)

tf_hub_model_spec = "https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1"

img_module = hub.Module(tf_hub_model_spec, trainable=is_training, tags=tags_val)

#Add final dense layer, etc
1

1 Answers

1
votes

For https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1, the difference between default tags (meaning the empty set) and tags={"train"} is that the latter operates batch norm in training mode (i.e., using batch statistics for normalization). If that leads to catastrophic quality loss, my first suspicion would be: are UPDATE_OPS being run with the train_op?

https://github.com/tensorflow/hub/issues/24 discusses that on the side of other issues, with code pointers.