CNN: Softmax layer for pixel-wise classification

Question

I want to understand in more details how a softmax layer can look in a CNN for semantic segmentation / pixelwise classification of an image. The CNN outputs an image of class labels, where each pixel of the original image gets a label.

After passing a test image through the network, the next-to-last layer outputs N channels of the resolution of the original image. My question is, how the softmax layer transforms these N channels to the final image of labels.

Assumed we have C classes (# possible labels). My suggestion is that for each pixel, its N neurons of the previous layer are connected to C neurons in the softmax layer, where each of the C neurons represents one class. Using the softmax activation function, the sum of the C outputs (for this pixel) is equal to 1 (which facilitates training of the network). Last, each pixel is classified as the class with the highest probability (given by softmax values). This would mean, that the softmax layer consists of C * #pixels neurons. Is my suggestion correct? I didn't find an explanation for this and hope that you can help me.

Thanks for helping!

Farshid Rayhan Farshid Rayhan · Accepted Answer · 2018-06-10T18:44:35

The answer is softmax layer Do not transforms these N channels to the final image of labels

Assuming you have a output of N channel your question is how do you convert it to a 3 channel for the final output.

The answer is you dont. Each of those N channel represents a class. The way to go is that you should have a dummy array with same height and weight and 3 channels.

Now you fist have to abstractly encode each class with a color, like streets as green, cars as red etc.

Assume for height = 5 and width = 5, channel 7 has the max value. Now,

-> if the channel 7 represents car the you need to put a red pixel on the dummy array where height = 5 and width = 5.

-> if the channel 7 represents street the you need to put a green pixel on the dummy array where height = 5 and width = 5.

So you are trying to look for which of the N class a pixel belongs to. And based on the class you will redraw the pixel in a unique color on the dummy array.

This dummy array is called the mask.

For example, assume this is a input

We are trying to locate the tumor area of the brain using pixel wise classification. Here the number of classes are 2, Tumor present and not present. So the softmax layer outputs a 2 channel object where channel 1 says tumor present and channel 2 says otherwise.

So whenever for height = X and width = Y, channel 1 has higher value we make a white pixel of the dummmy[X][Y] image. When the channel 2 has higher value we make a black pixel.

After that we get a mask like this,

Which doesnt make that much sense. But when we overlay the two image, we get this

So basically you will try to create the mask image (2nd one) from your output with N Channel. And overlaying them will get you the final output

CNN: Softmax layer for pixel-wise classification

1 Answers