Normalize output without Softmax

Question

Training with a softmax output layer for my generative neural network gives better results than with relu overall but relu gives me the sparsity I need (zeros in pixels). Softmax also helps get a normalised output (i.e. sum =1.).

I want to do:

outputs = Dense(200, activation='softmax', activity_regularizer=l1(1e-5))(x)
outputs = Activation('relu')(outputs) # to get real zeros
outputs = Activation('softmax')(outputs) # still real zeros, normalized output

But by applying successive softmax I will get extreme outputs. Is there a layer I can use instead which just normalizes the output to 1 (output_i/sum(output)) instead of softmax ?

Daniel Möller Daniel Möller · Accepted Answer · 2018-05-18T17:07:14

You don't need to add two softmax. Just the last one is fine:

outputs = Dense(200, activation='relu', activity_regularizer=l1(1e-5))(x)
outputs = Activation('softmax')(outputs) # still real zeros, normalized

Yet, if you have more intermediate layers and you want them to behave more moderately, you could use a "tanh" instead of softmax.

Often the problem with relu models is not exactly "they don't sum 1", but simply "their values are way to high, gradients can't behave well".

#this combines a max output of 1 (but doesn't care about the sum)
#yet keeping the sparsity:
outputs = Dense(200, activation='tanh')(x)
outputs = Activation('relu')(outputs) # to get real zeros

outputs = Dense(200, activation='relu')(outputs)

#this should only be used at the final layer
#and only if you really have a classification model with only one correct class
outputs = Activation('softmax')(outputs) # still real zeros, normalized output

Softmax tends to favor only one of the results. If you don't want to change how results compare one to another and yet you want to make sum=1, you can go for @nuric's answer.

Normalize output without Softmax

2 Answers