I am currently implementing a custom loss layer and in the process, I stumbled upon the implementation of mean squared error in the objectives.py file [1]. I know I'm missing something in my understanding of this loss calculation because I always thought that the average was done separately across the samples for each output in each mini-batch (axis 0 of the tensor) but it appears that the average is actually being done across the last axis, which in a single vector, would mean it's being done across the outputs. I found this by accident while working on my custom loss layer because it requires discounting the loss of a few of the outputs it a training output in a specific place is a specific value. Anyways, is my understanding of the mean squared error incorrect? Why would Keras be using the last axis and thus turning a a 1xn output vector into a 1x1 output vector?
Thanks.
[1] https://github.com/fchollet/keras/blob/master/keras/objectives.py#L7