How to apply advanced activation layers with a CNN layer?

Question

I tried to apply ReLU and PReLU with a CNN layer to compare the results and I tried these code:

with ReLU:

model.add(Conv1D(filters, kernel_size, activation='relu'))

with PReLU:

model.add(Conv1D(filters, kernel_size))
model.add(PReLU())

Does the Conv1D layer use the PReLU as the activation function?

I doubt because I printed the model summary and it shows separate layers between CNN and PReLU with a different number of parameters, meanwhile the CNN layer with the ReLU function they are in the same layer.

If I used the wrong code, how can I correct it?

Shubham Panchal Shubham Panchal · Accepted Answer · 2020-04-18T02:51:17

Yes, the Conv1D layer will use the PReLu activation function. When you are defining a Conv2D layer like,

x = tf.keras.layers.Conv2D( 13 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' )( inputs )

The above statement is equivalent to,

x = tf.keras.layers.Conv2D( 13 , kernel_size=( 3 , 3 ) , strides=1 )( inputs )
x = tf.keras.layers.Activation( 'relu' )( x )

The reason for providing activation functions as separate layers is that sometimes we'll need to add our logic to the feature maps before passing the feature maps to the activation function.

For instance, a BatchNormalization layer is added before passing the feature maps to the activation function,

x = tf.keras.layers.Conv2D( 13 , kernel_size=( 3 , 3 ) , strides=1 )( inputs )
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation( 'relu' )( x )

Coming back to your question,

Some special activation functions like elu, LeakyReLU and PReLU are added as separate layers and we can't include them in the Conv1D layers using the activation= argument.

Regarding the trainable parameters, the conv1d_18 layer has 15050 parameters which form the kernel in 1D convolution. These parameters have nothing to do with the activation function.

The 4900 parameters of PReLU are the slope parameters which are optimized with backpropagation. These parameters, along with kernel weights, will update with every batch and hence are included in trainable parameters.

So, the outputs ( unactivated ) of the Conv1D layer will pass through the PReLU activation which indeed uses the slope parameter to calculate the activated outputs.

How to apply advanced activation layers with a CNN layer?

2 Answers