In my hidden layer it does not make sense to me to use the softmax activation function too - is this correct?
It is correct indeed.
If so can I just use any other non-linear activation function such as sigmoid or tanh?
You can, but most modern approaches would call for a Rectified Linear Unit (ReLU), or some of its variants (Leaky ReLU, ELU etc).
Or could I even not use any activation function in the hidden layer and just keep the values of the hidden nodes as the linear combinations of the input nodes and input-to-hidden weights?
No. The non-linear activations are indeed what prevents a (possibly large) neural network from behaving just like a single linear unit; it can be shown (see Andrew Ng's relevant lecture @ Coursera Why do you need non-linear activation functions?) that:
It turns out that if you use
a linear activation function, or
alternatively if you don't have an
activation function, then no matter how
many layers your neural network has,
what is always doing is just computing a linear
activation function, so you might as well
not have any hidden layers.
The
take-home is that a linear hidden layer
is more or less useless because the
composition of two linear functions is
itself a linear function; so unless you
throw a non-linearity in there then
you're not computing more interesting
functions even as you go deeper in the
network.
Practically, the only place where you could use a linear activation function is the output layer for regression problems (explained also in the lecture linked above).