1
votes

This ended up being a different issue from the one in the question

I have a very simple Keras model that accepts time series data. I want to use a recurrent layer to predict a new sequence of the same dimensions, with a softmax on the end to provide a normalised result at each time step.

This is how my model looks.

x = GRU(256, return_sequences=True)(x)
x = TimeDistributed(Dense(3, activation='softmax'))(x)

Imagine the input is something like:

[
  [0.25, 0.25, 0.5],
  [0.3, 0.3, 0.4],
  [0.2, 0.7, 0.1],
  [0.1, 0.1, 0.8]
]

I'd expect the output to be the same shape and normalised at each step, like:

[
  [0.15, 0.35, 0.5],
  [0.35, 0.35, 0.3],
  [0.1, 0.6, 0.3],
  [0.1, 0.2, 0.7]
]

But what I actually get is a result where the sum of elements in each row are actually a quarter (or whatever fraction of the number of rows), not 1.

Put simply, I thought the idea of TimeDistributed was to apply the Dense layer to each time step, so effectively the Dense with softmax activation would be applied repeatedly to each timestep. But I seem to be getting a result that looks like it is normalized across all elements in the output matrix of time steps.

Since I seem to understand incorrectly, is there a way to get a Dense softmax result for each time step (normalized to 1 at each step) without having to predict each time step sequentially?

1

1 Answers

2
votes

It appears that the issue wasn't with the handling of Softmax with a TimeDistributed wrapper, but an error in my predictions function, which was summing over the whole matrix rather than on a row by row basis.