Tensorflow LSTM Dropout Implementation

Question

How specifically does tensorflow apply dropout when calling tf.nn.rnn_cell.DropoutWrapper() ?

Everything I read about applying dropout to rnn's references this paper by Zaremba et. al which says don't apply dropout between recurrent connections. Neurons should be dropped out randomly before or after LSTM layers, but not inter-LSTM layers. Ok.

The question I have is how are the neurons turned off with respect to time?

In the paper that everyone cites, it seems that a random 'dropout mask' is applied at each timestep, rather than generating one random 'dropout mask' and reusing it, applying it to all the timesteps in a given layer being dropped out. Then generating a new 'dropout mask' on the next batch.

Further, and probably what matters more at the moment, how does tensorflow do it? I've checked the tensorflow api and tried searching around for a detailed explanation but have yet to find one.

Is there a way to dig into the actual tensorflow source code?

Robert Lacok Robert Lacok · Accepted Answer · 2017-02-27T17:19:20

You can check the implementation here.

It uses the dropout op on the input into the RNNCell, then on the output, with the keep probabilities you specify.

It seems like each sequence you feed in gets a new mask for input, then for output. No changes inside of the sequence.

Tensorflow LSTM Dropout Implementation

1 Answers