TensorFlow : lstm dropout implementation, shape problems

Question

I'm working on an prediction project using lstm model in TensorFlow. The structure of the implementation worked, however, got a bad result which the accuracy of testing set was only 0.5. Thus, I have searched whether there exists some tricks of training a lstm-based model. Then I got "adding dropout".

However, following the tutorial by others, some errors occur.

Here's the original version and it worked :

def lstmModel(x, weights, biases):
    x = tf.unstack(x, time_step, 1)

    lstm_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, state_is_tuple=True, forget_bias=1)
    outputs, states = rnn.static_rnn (lstm_cell, x, dtype=tf.float32)rnn.static_rnn)

    return tf.matmul(outputs[-1], weights['out']) + biases['out']

and after changing to below, it occurs an error :

ValueError: Shape (90, ?) must have rank at least 3

def lstmModel(x, weights, biases):
    x = tf.unstack(x, time_step, 1)

    lstm_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, state_is_tuple=True, forget_bias=1)
    lstm_dropout = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.5)
    lstm_layers = rnn.MultiRNNCell([lstm_dropout]* 3)
    outputs, states = tf.nn.dynamic_rnn(lstm_layers, x, dtype=tf.float32)
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

I'm confused if my shape of input data went wrong. Before entering this function, the input x is in the shape (batch_size, time_step, data_size)

batch_size = 30 
time_step = 4 #read 4 words 
data_size = 80 # total 80 words, each is in np.shape of [1,80]

So, the input shape x each batch is [30,4,80]. And the input word x[0,0,80] is followed by the word x[0,1,80]. Does the design make sense ?

The whole implementation is actually modified by other tutorial and I also wonder what did the tf.unstack() actually do?

several problems above... I have putted the code in github with "worked version" and "failed version" mentioned above. Only the mentioned function differs! Please check, thanks!

Would you be able to provide a minimal, complete and verifiable example as described in stackoverflow.com/help/mcve? — pfm
@Nicolas sorry for that, I have put it in the github... github.com/billy0059/LstmTest/tree/… — BillyWang

Yahia Zakaria Yahia Zakaria · Accepted Answer · 2017-07-24T11:17:23

Removing tf.unstack from the second example should help.

tf.unstack is used to break a tensor into a list of tensors. In your case, it will break x which is of size (batch_size, time_step, data_size) into a list of length time_step containing tensors of size (batch_size, data_size).

This is needed for tf.nn.static_rnn since it unfolds the rnn during graph creation so it needs a pre-specified number of step which is the length of the list coming from tf.unstack.

tf.nn.dynamic_rnn is unfolded in each run so that it can do a variable number of steps, therefore it takes one tensor where dimension 0 is the batch_size, dimension 1 is the time_step and dimension 2 is the data_size (or the first two dimensions are reversed if time_major is True).

The error is due to tf.nn.dynamic_rnn expecting a 3D tensor but each element in the supplied inputs list is 2D only due to tf.unstack.

tl;dr Use tf.unstack with tf.nn.static_rnn but never use it with tf.nn.dynamic_rnn.

TensorFlow : lstm dropout implementation, shape problems

1 Answers