Actually, Softmax functions are already used deep within neural networks, in certain cases, when dealing with differentiable memory and with attention mechanisms!
Softmax layers can be used within neural networks such as in Neural Turing Machines (NTM) and an improvement of those which are Differentiable Neural Computer (DNC).
To summarize, those architectures are RNNs/LSTMs which have been modified to contain a differentiable (neural) memory matrix which is possible to write and access through time steps.
Quickly explained, the softmax function here enables a normalization of a fetch of the memory and other similar quirks for content-based addressing of the memory. About that, I really liked this article which illustrates the operations in an NTM and other recent RNN architectures with interactive figures.
Moreover, Softmax is used in attention mechanisms for, say, machine translation, such as in this paper. There, the Softmax enables a normalization of the places to where attention is distributed in order to "softly" retain the maximal place to pay attention to: that is, to also pay a little bit of attention to elsewhere in a soft manner. However, this could be considered like to be a mini-neural network that deals with attention, within the big one, as explained in the paper. Therefore, it could be debated whether or not Softmax is used only at the end of neural networks.
Hope it helps!
Edit - More recently, it's even possible to see Neural Machine Translation (NMT) models where only attention (with softmax) is used, without any RNN nor CNN: http://nlp.seas.harvard.edu/2018/04/03/attention.html