Understanding Tensor Inputs & Transformations for use in an LSTM (dynamic RNN)

Question

I am building an LSTM style neural network in Tensorflow and am having some difficulty understanding exactly what input is needed and the subsequent transformations made by tf.nn.dynamic_rnn before it is passed to the sparse_softmax_cross_entropy_with_logits layer.

https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn

Understanding the input

The input function is sending a feature tensor in the form

[batch_size, max_time]

However the manual states that input tensors must be in the form

[batch_size, max_time, ...]

I have therefore expanded the input with a 1d tensor to take the form

[batch_size, max_time, 1]

At this point the input does not break upon running, but I don't understand exactly what we have done here and suspect it may be causing the problems when calculating loss (see below).

Understanding the Transformations

This expanded tensor is then the 'features' tensor used in the code below

LSTM_SIZE = 3
lstm_cell = rnn.BasicLSTMCell(LSTM_SIZE, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_cell, features, dtype=tf.float64)

#slice to keep only the last cell of the RNN
outputs = outputs[-1]

#softmax layer

with tf.variable_scope('softmax'):
   W = tf.get_variable('W', [LSTM_SIZE, n_classes], dtype=tf.float64)
   b = tf.get_variable('b', [n_classes], initializer=tf.constant_initializer(0.0), dtype=tf.float64)

logits = tf.matmul(outputs, W) + b

loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels))

This throws a value error at loss

dimensions must be equal, but are [max_time, num_classes] and [batch_size]

from https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/classification -

A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.

At some point in the process max_time and batch_size have been mixed up and I'm uncertain if its at input or during the LSTM. I'm grateful for any advice!

Pietro Tortella Pietro Tortella · Accepted Answer · 2017-11-16T11:14:07

That is because of the shape of the output of the tf.nn.dynamic_rnn. From its documentation https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn:

outputs: The RNN output Tensor.

If time_major == False (default), this will be a Tensor shaped: [batch_size, max_time, cell.output_size].

If time_major == True, this will be a Tensor shaped: [max_time, batch_size, cell.output_size].

you are in the default case, so your outputs gas shape [batch_size, max_time, output_size], and when performing outputs[-1] you obtain a tensor with shape [max_time, output_size]. Probably slicing with outputs[:, -1] should fix it.

Understanding Tensor Inputs & Transformations for use in an LSTM (dynamic RNN)

Understanding the input

Understanding the Transformations

1 Answers