I'm currently interested in using Cross Entropy Error when performing the BackPropagation algorithm for classification, where I use the Softmax Activation Function in my output layer.
From what I gather, you can drop the derivative to look like this with Cross Entropy and Softmax:
Error = targetOutput[i] - layerOutput[i]
This differs from the Mean Squared Error of:
Error = Derivative(layerOutput[i]) * (targetOutput[i] - layerOutput[i])
So, can you only drop the derivative term when your output layer is using the Softmax Activation Function for classification with Cross Entropy? For instance, if I were to do Regression using the Cross Entropy Error (with say TANH activation function) I would still need to keep the derivative term, correct?
I haven't been able to find an explicit answer on this and I haven't attempted to work out the math on this either (as I am rusty).