0
votes

I have the following problem: I want to compute the softmax function in Python and get an unexpected result. The code is the following:

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

It works perfectly but I don´t know why: It works on matrices as follows: If I insert a 2x2 matrix A, the output is yet another 2x2 matrix. Why is that? Shouldn´t it return a differently sized array since every element of the matrix, i.e. $x=A[0,0]$, yields 2 output values (namely $exp(x)/(exp(A[0,0])+exp(A[1,0]))$ and $exp(x)/(exp(A[0,1])+exp(A[1,1]))$, because or the axis=0 command? That would lead to an 8-element output array, but the actual result only has 4 elements. Also, how exactly does the axis=0 command work? If I type A=np.array([2, 4]), then the logical result of np.sum(A, axis=0) should be array([2, 4]), since the columns are summed up. But the result is array([6]). And the command np.sum(A, axis=1) strangely yields "'axis' entry is out of bounds", although the result should be array([6]) since the rows are summed up. Maybe my two problems are linked. Any help will be appreciated! Thanks, Leon

1
Questions asking for debugging help should include a minimal reproducible example. Please include the inputs + expected outputs vs. actual outputs. - MSeifert
Your intuition on np.sum works if you consider the np.array([[2], [4]]) and use axis=1. - MariusSiuram

1 Answers

2
votes

I will jump into the "final" problem:

matrix_22 / vector_2

Because that does not make mathematical sense, numpy uses a certain assumption. Just as:

matrix_22 * 5

what that does is multiplying each element of the matrix by 5. Then if we consider a matrix_22 as a vector of vectors, then the result of the matrix_22 / vector_2 results on applying the operation division for each vector on the matrix.

You can easily check that behaviour executing the following:

np.array([[14, 28], [70, 56]]) / np.array([2, 7])

Notation: matrix_22 is "some variable which contains a numpy array of shape 2x2, so it is a 2x2 matrix". And vector_2 is a numpy array of two elements.