2
votes

All I want to do is download one of tensorflow's built in models (via keras), switch the softmax at the output layer off (i.e. replace it with the linear activation function), so that my output features are the activations on the output layer before softmax is applied.

So, I grab VGG16 as a model, and call it base_model

from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
base_model = VGG16()

I have a look at the final layer like this:

base_model.get_layer('predictions').get_config()

and get:

{'name': 'predictions',
 'trainable': True,
 'dtype': 'float32',
 'units': 1000,
 'activation': 'softmax',
 'use_bias': True,
 'kernel_initializer': {'class_name': 'GlorotUniform',
  'config': {'seed': None, 'dtype': 'float32'}},
 'bias_initializer': {'class_name': 'Zeros', 'config': {'dtype': 'float32'}},
 'kernel_regularizer': None,
 'bias_regularizer': None,
 'activity_regularizer': None,
 'kernel_constraint': None,
 'bias_constraint': None}

Then, I do this to switch activation functions:

base_model.get_layer('predictions').activation=tf.compat.v1.keras.activations.linear

and it looks like it works as :

base_model.get_layer('predictions').get_config()

gives:

{'name': 'predictions',
 'trainable': True,
 'dtype': 'float32',
 'units': 1000,
 'activation': 'linear',
 'use_bias': True,
 'kernel_initializer': {'class_name': 'GlorotUniform',
  'config': {'seed': None, 'dtype': 'float32'}},
 'bias_initializer': {'class_name': 'Zeros', 'config': {'dtype': 'float32'}},
 'kernel_regularizer': None,
 'bias_regularizer': None,
 'activity_regularizer': None,
 'kernel_constraint': None,
 'bias_constraint': None}.

But when I put in a picture, using:

filename = 'test_data/ILSVRC2012_val_00001218.JPEG'
img = image.load_img(filename, target_size=(224, 224)) # loads image
x = image.img_to_array(img) # convets to a numpy array
x = np.expand_dims(x, axis=0) # batches images
x = preprocess_input(x) # prepare the image for the VGG model

and I do a predict on it, to get my features:

features = base_model.predict(x)

The feature still sum to 1, i.e. they look like they have been normalised by softmax as

sum(features[0])

is 1.0000000321741935, which is the exact same number I got when I did this with the softmax activation function on that layer.

I also tried copying out the config dictionary with 'linear' in it, and using set_config on the output layer.

Turning off softmax seems to be bizarrely hard to do in tensorflow: in caffe, you can just switch activation functions for a pre-trained model by just changing one line in the deploy file, so I really don't understand why this is so difficult in tensorflow. I'm after switching my code from caffe to tensorflow, as I thought that it would be easier to use tf to just grab pre-trained models, but this issue is making me reconsider.

I supposed I could try to rip off the prediction layer and replace it with a brand new one with all the same settings (and put the old weights in), but I am sure there must be way to just edit the prediction layer.

I'm using TensorFlow 1.14.0 at the moment, I'm planning to upgrade to 2.0, but I don't think that using tensorflow 1 is the problem here.

Can anyone explain to me how to turn off softmax please? It should be a simple thing to do and I've spent hours on it and have even joined stack overflow just to get this single issue fixed.

Thanks in advance for any help.

2

2 Answers

6
votes

As already mentioned above, you can always reverse the softmax operation that should be straight forward. But if you still want to change the activation you will have to copy weights to a new layer.

import tensorflow as tf

model = tf.keras.applications.ResNet50()
assert model.layers[-1].activation == tf.keras.activations.softmax

config = model.layers[-1].get_config()
weights = [x.numpy() for x in model.layers[-1].weights]

config['activation'] = tf.keras.activations.linear
config['name'] = 'logits'

new_layer = tf.keras.layers.Dense(**config)(model.layers[-2].output)
new_model = tf.keras.Model(inputs=[model.input], outputs=[new_layer])
new_model.layers[-1].set_weights(weights)

assert new_model.layers[-1].activation == tf.keras.activations.linear
-1
votes

Keras is unfortunately not designed around "complex" things like making specific modifications to existing nets. I believe that it is possible to get the output before the activation, but that involves traversing the op graph, and isn't exactly straightforward. I attempted to do that at one point, but found it to be too difficult, and solved my issue in a different way.

If you were making your own model, you could just make the activation a separate layer, and then you can pop the layer off at will. However, since you're using a premade model, you can't do this.

Depending on your exact situation, you have two options that I can see:

  1. If you want a quick hacky solution that isn't perfect but might work well enough, you can simply calculate what it would have been. Softmax is a well-defined equation, so you can simply make an inverse equation, and then apply that to the softmax output. This won't get you the exact output but should be close enough for many situations.
  2. If you want a stable, maintainable solution, just make a new layer without an activation and copy the weights. I agree that this feel weird to do, but it's really not that hard, and I can't think of any rational reason to not do this.