9
votes

I am trying to build a CNN + RNN model and I am getting the following error. Any help will be appreciated.

fc2 has shape (?,4096)

cell = tf.contrib.rnn.BasicLSTMCell(self.rnn_hidden_units)
stack = tf.contrib.rnn.MultiRNNCell([cell]*self.rnn_layers)
initial_state = cell.zero_state(self.batch_size, tf.float32)
initial_state = tf.identity(initial_state, name='initial_state')
outputs, _ = tf.nn.dynamic_rnn(stack, fc2,dtype=tf.float32)

File "rcnn.py", line 182, in model outputs, _ = tf.nn.dynamic_rnn(stack, [fc2],dtype=tf.float32)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 574, in dynamic_rnn dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 637, in _dynamic_rnn_loop for input_ in flat_input)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 637, in for input_ in flat_input)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 649, in with_rank_at_least raise ValueError("Shape %s must have rank at least %d" % (self, rank)) ValueError: Shape (4096, ?) must have rank at least 3

2
The error seems fairly clear, tf.nn.dynamic_rnn expects a 3-dimensional tensor as input (i.e. rank 3), but fc2 has only two dimensions. The shape of fc2 should be something like (<batch_size>, <max_time>, <num_features>) (or (<max_time>, <batch_size>, <num_features>) if you pass time_major=True).jdehesa
@jdehesa I am taking an image (for ocr ) and passing it through a CNN and later connecting it to a fully connected layer hence the shape fc2 is (?,4096) , is there some other way to do this then ?lordzuko
That's okay, but what would be the "time" dimension then? If you only have one image (that is, a batch of examples, each of which contains one image, I assume), what is the dimension that you want the RNN to iterate? The pixels, kind of like PixelRNN?jdehesa
@jdehesa I am following this paper: arxiv.org/pdf/1603.03101.pdf . They have mentioned an architecture where they are performing character-level-Language modelling from the feature extracted from Image. If you could suggest how can this be implemented, it will be really helpful.lordzuko
I'm sorry, I don't know much about those models... For the character-level modelling, I think it could be something like stacking N copies of the result of the convolution and input it to the RNN, then each K-vector output would be the probability of each letter, but I'm not really sure...jdehesa

2 Answers

9
votes

Copying the answer of @jdehesa from his comment for better visibility:

The error seems fairly clear, tf.nn.dynamic_rnn expects a 3-dimensional tensor as input (i.e. rank 3), but fc2 has only two dimensions. The shape of fc2 should be something like (<batch_size>, <max_time>, <num_features>) (or (<max_time>, <batch_size>, <num_features>) if you pass time_major=True)

1
votes

The default input of tf.nn.dynamic_rnn has a dimension of 3 (Batchsize, sequence_length, num_features). Since your num_features is 1 you can expand your fc_seq with

fc2 = tf.expand_dims(fc2, axis = 2)

and then use the code you have above.