This ended up being a different issue from the one in the question
I have a very simple Keras model that accepts time series data. I want to use a recurrent layer to predict a new sequence of the same dimensions, with a softmax on the end to provide a normalised result at each time step.
This is how my model looks.
x = GRU(256, return_sequences=True)(x)
x = TimeDistributed(Dense(3, activation='softmax'))(x)
Imagine the input is something like:
[
[0.25, 0.25, 0.5],
[0.3, 0.3, 0.4],
[0.2, 0.7, 0.1],
[0.1, 0.1, 0.8]
]
I'd expect the output to be the same shape and normalised at each step, like:
[
[0.15, 0.35, 0.5],
[0.35, 0.35, 0.3],
[0.1, 0.6, 0.3],
[0.1, 0.2, 0.7]
]
But what I actually get is a result where the sum of elements in each row are actually a quarter (or whatever fraction of the number of rows), not 1.
Put simply, I thought the idea of TimeDistributed was to apply the Dense layer to each time step, so effectively the Dense with softmax activation would be applied repeatedly to each timestep. But I seem to be getting a result that looks like it is normalized across all elements in the output matrix of time steps.
Since I seem to understand incorrectly, is there a way to get a Dense softmax result for each time step (normalized to 1 at each step) without having to predict each time step sequentially?