I have used the embedding_attention_seq2seq module for a machine translation task as was described in the tutorials specified by:
https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html
In the seq2seq_model.py
that specifies the model in the tutorial, I have noticed that they have used a GRUCell by default if use_lstm
is set to false
in these lines:
# Create the internal multi-layer cell for our RNN.
single_cell = tf.nn.rnn_cell.GRUCell(size)
if use_lstm:
single_cell = tf.nn.rnn_cell.BasicLSTMCell(size)
cell = single_cell
if num_layers > 1:
cell = tf.nn.rnn_cell.MultiRNNCell([single_cell] * num_layers)
Now the attention mechanism described in the paper here that the tutorial specifies as the model implemented makes much more semantic sense if the encoder is bidirectional and the contextualization takes into account both the hidden layer parameters. The seq2seq_model file doesn't have any mention of a bidirectional component.
So my question is, does the embedding_attention_seq2seq implement a bidirectional RNN Encoder by default?
If not, does it simply take the hidden layer outputs of each time step of an ordinary LSTM Encoder, thereby limiting the context to only the words in the sentence that have occurred before it?