I am not 100% sure what your question means but I think your confusion is simply that you don't understand what dim parameter means. So I will explain it and provide examples.
If we have:
m0 = nn.Softmax(dim=0)
what that means is that m0 will normalize elements along the zeroth coordinate of the tensor it receives. Formally if given a tensor b of size say (d0,d1) then the following will be true:
sum^{d0}_{i0=1} b[i0,i1] = 1, forall i1 \in {0,...,d1}
you can easily check this with a Pytorch example:
>>> b = torch.arange(0,4,1.0).view(-1,2)
>>> b
tensor([[0., 1.],
[2., 3.]])
>>> m0 = nn.Softmax(dim=0)
>>> b0 = m0(b)
>>> b0
tensor([[0.1192, 0.1192],
[0.8808, 0.8808]])
now since dim=0 means going through i0 \in {0,1} (i.e. going through the rows) if we choose any column i1 and sum its elements (i.e. the rows) then we should get 1. Check it:
>>> b0[:,0].sum()
tensor(1.0000)
>>> b0[:,1].sum()
tensor(1.0000)
as expected.
Note we do get all rows sum to 1 by "summing out the rows" with torch.sum(b0,dim=0), check it out:
>>> torch.sum(b0,0)
tensor([1.0000, 1.0000])
We can create a more complicated example to make sure it's really clear.
a = torch.arange(0,24,1.0).view(-1,3,4)
>>> a
tensor([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]],
[[12., 13., 14., 15.],
[16., 17., 18., 19.],
[20., 21., 22., 23.]]])
>>> a0 = m0(a)
>>> a0[:,0,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,2,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,1,1].sum()
tensor(1.0000)
>>> a0[:,2,3].sum()
tensor(1.0000)
so as we expected if we sum all the elements along the first coordinate from the first value to the last value we get 1. So everything is normalized along the first dimension (or first coordiante i0).
>>> torch.sum(a0,0)
tensor([[1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000]])
Also along the dimension 0 means that you vary the coordinate along that dimension and consider each element. Sort of like having a for loop going through the values the first coordinates can take i.e.
for i0 in range(0,d0):
a[i0,b,c,d]
dim=0means the following: consider a tensortof size(s0,s1,s2,s3). Then going along dimension0means that the coordinates we can index in that dimension range from the beginning to the end of the number element of that dimension. In this case it means going throught[0,b,c,d], ... , t[i0,b,c,d] , ... , t[s0,b,c,d]. Just going through all values of the zeroth coordinate. - Charlie Parker