3
votes

I have used the embedding_attention_seq2seq module for a machine translation task as was described in the tutorials specified by:

https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html

In the seq2seq_model.py that specifies the model in the tutorial, I have noticed that they have used a GRUCell by default if use_lstm is set to false in these lines:

# Create the internal multi-layer cell for our RNN.
single_cell = tf.nn.rnn_cell.GRUCell(size)
if use_lstm:
  single_cell = tf.nn.rnn_cell.BasicLSTMCell(size)
cell = single_cell
if num_layers > 1:
  cell = tf.nn.rnn_cell.MultiRNNCell([single_cell] * num_layers)

Now the attention mechanism described in the paper here that the tutorial specifies as the model implemented makes much more semantic sense if the encoder is bidirectional and the contextualization takes into account both the hidden layer parameters. The seq2seq_model file doesn't have any mention of a bidirectional component.

So my question is, does the embedding_attention_seq2seq implement a bidirectional RNN Encoder by default?

If not, does it simply take the hidden layer outputs of each time step of an ordinary LSTM Encoder, thereby limiting the context to only the words in the sentence that have occurred before it?

1

1 Answers

3
votes

So my question is, does the embedding_attention_seq2seq implement a bidirectional RNN Encoder by default?

No, it does not implement a bidirectional RNN Encoder. The output of the encoder (which is used to build the attention states) is constructed inside the first few lines of embedding_attention_seq2seq:

# Encoder.
encoder_cell = rnn_cell.EmbeddingWrapper(
    cell, embedding_classes=num_encoder_symbols,
    embedding_size=embedding_size)
encoder_outputs, encoder_state = rnn.rnn(
    encoder_cell, encoder_inputs, dtype=dtype)

The first line wraps the cell with an embedding. The second runs the encoder_cell forward over the encoder_inputs (lines 210-228 of tf/python/ops/rnn.py).

If not, does it simply take the hidden layer outputs of each time step of an ordinary LSTM Encoder, thereby limiting the context to only the words in the sentence that have occurred before it?

That's correct.