5
votes

Which dimension should softmax be applied to ?

This code :

%reset -f

import torch.nn as nn
import numpy as np
import torch

my_softmax = nn.Softmax(dim=-1)

mu, sigma = 0, 0.1 # mean and standard deviation

train_dataset = []
image = []
image_x = np.random.normal(mu, sigma, 24).reshape((3 , 4, 2))
train_dataset.append(image_x)

x = torch.tensor(train_dataset).float()

print(x)
print(my_softmax(x))
my_softmax = nn.Softmax(dim=1)
print(my_softmax(x))

prints following :

tensor([[[[-0.1500,  0.0243],
          [ 0.0226,  0.0772],
          [-0.0180, -0.0278],
          [ 0.0782, -0.0853]],

         [[-0.0134, -0.1139],
          [ 0.0385, -0.1367],
          [-0.0447,  0.1493],
          [-0.0633, -0.2964]],

         [[ 0.0123,  0.0061],
          [ 0.1086, -0.0049],
          [-0.0918, -0.1308],
          [-0.0100,  0.1730]]]])
tensor([[[[ 0.4565,  0.5435],
          [ 0.4864,  0.5136],
          [ 0.5025,  0.4975],
          [ 0.5408,  0.4592]],

         [[ 0.5251,  0.4749],
          [ 0.5437,  0.4563],
          [ 0.4517,  0.5483],
          [ 0.5580,  0.4420]],

         [[ 0.5016,  0.4984],
          [ 0.5284,  0.4716],
          [ 0.5098,  0.4902],
          [ 0.4544,  0.5456]]]])
tensor([[[[ 0.3010,  0.3505],
          [ 0.3220,  0.3665],
          [ 0.3445,  0.3230],
          [ 0.3592,  0.3221]],

         [[ 0.3450,  0.3053],
          [ 0.3271,  0.2959],
          [ 0.3355,  0.3856],
          [ 0.3118,  0.2608]],

         [[ 0.3540,  0.3442],
          [ 0.3509,  0.3376],
          [ 0.3200,  0.2914],
          [ 0.3289,  0.4171]]]])

So first tensor is prior to softmax being applied, second tensor is result of softmax applied to tensor with dim=-1 and third tensor is result of softmax applied to tensor with dim=1 .

For result of first softmax can see corresponding elements sum to 1, for example [ 0.4565, 0.5435] -> 0.4565 + 0.5435 == 1.

What is summing to 1 as result of of second softmax ?

Which dim value should I choose ?

Update : The dimension (3 , 4, 2) corresponds to image dimension where 3 is the RGB value , 4 is the number of horizontal pixels (width) , 2 is the number of vertical pixels (height). This is an image classification problem. I'm using cross entropy loss function. Also, I'm using softmax in final layer in order to back-propagate probabilities.

2
It's hard to tell without context. Imagine I show you 3 variables a, b, c and ask you which I should sum? There's no good answer to that without context. Softmax produces a probability distribution i.e. for each element e_i, e_i \in [0, 1] and \sum{e_i} = 1. You must have good reason to do so (are you somehow computing probabilities? Or loss function?). Applying softmax on the dataset without any prior transformation (i.e. operations) does not really make sense to me.pltrdy
@pltrdy please see update, does this provide adequate context ?blue-sky

2 Answers

17
votes

You have a 1x3x4x2 tensor train_dataset. Your softmax function's dim parameter determines across which dimension to perform Softmax operation. First dimension is your batch dimension, second is depth, third is rows and last one is columns. Please look at picture below (sorry for horrible drawing) to understand how softmax is performed when you specify dim as 1. enter image description here

In short, sum of each corresponding entry of your 4x2 matrices are equal to 1.

Update: The question which dimension the softmax should be applied depends on what data your tensor store, and what is your goal.

Update: For image classification task, please see the tutorial on official pytorch website. It covers basics of image classification with pytorch on a real dataset and its a very short tutorial. Although that tutorial does not perform Softmax operation, what you need to do is just use torch.nn.functional.log_softmax on output of last fully connected layer. See MNIST classifier with pytorch for a complete example. It does not matter whether your image is RGB or grayscale after flattening it for fully connected layers (also keep in mind that same code for MNIST example might not work for you, depends on which pytorch version you use).

-1
votes

For most of Deep Learning Problems,we will definitely come up with batches. So dim will always be 1. Don't get confused with it.Through that we just say the function to do operation along the contents of each batch(Here it is a vector i.e if you have 8 classes, 8 elements will be there in each row). You can also mention dim=-1 too.