I am building an LSTM style neural network in Tensorflow and am having some difficulty understanding exactly what input is needed and the subsequent transformations made by tf.nn.dynamic_rnn before it is passed to the sparse_softmax_cross_entropy_with_logits layer.
https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn
Understanding the input
The input function is sending a feature tensor in the form
[batch_size, max_time]
However the manual states that input tensors must be in the form
[batch_size, max_time, ...]
I have therefore expanded the input with a 1d tensor to take the form
[batch_size, max_time, 1]
At this point the input does not break upon running, but I don't understand exactly what we have done here and suspect it may be causing the problems when calculating loss (see below).
Understanding the Transformations
This expanded tensor is then the 'features' tensor used in the code below
LSTM_SIZE = 3
lstm_cell = rnn.BasicLSTMCell(LSTM_SIZE, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_cell, features, dtype=tf.float64)
#slice to keep only the last cell of the RNN
outputs = outputs[-1]
#softmax layer
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [LSTM_SIZE, n_classes], dtype=tf.float64)
b = tf.get_variable('b', [n_classes], initializer=tf.constant_initializer(0.0), dtype=tf.float64)
logits = tf.matmul(outputs, W) + b
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels))
This throws a value error at loss
dimensions must be equal, but are [max_time, num_classes] and [batch_size]
from https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/classification -
A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.
At some point in the process max_time and batch_size have been mixed up and I'm uncertain if its at input or during the LSTM. I'm grateful for any advice!