I am attempting to port some TensorFlow 1 code to TensorFlow 2. The old code used the now deprecated MultiRNNCell to create a GRU layer with multiple hidden layers. In TensorFlow 2 I want to use the in-built GRU Layer, but there doesn't seem to be an option which allows for multiple hidden layers with that class. The PyTorch equivalent has such an option exposed as an initialization parameter, num_layers
.
My workaround has been to use the TensorFlow RNN layer and pass a GRU cell for each hidden layer I want - this is the way recommended in the docs:
dim = 1024
num_layers = 4
cells = [tf.keras.layers.GRUCell(dim) for _ in range(num_layers)]
gru_layer = tf.keras.layers.RNN(
cells,
return_sequences=True,
stateful=True
)
But the in-built GRU layer has support for CuDNN, which the plain RNN seems to lack, to quote the docs:
Mathematically, RNN(LSTMCell(10)) produces the same result as LSTM(10). In fact, the implementation of this layer in TF v1.x was just creating the corresponding RNN cell and wrapping it in a RNN layer. However using the built-in GRU and LSTM layers enables the use of CuDNN and you may see better performance.
So how can I achieve this? How do I get a GRU layer that supports both multiple hidden layers and has support for CuDNN? Given that the inbuilt GRU layer in TensorFlow lacks such an option, is it in fact necessary? Or is the only way to get a deep GRU network is to stack multiple GRU layers in a sequence?
EDIT: It seems, according to this answer to a similar question, that there is indeed no in-built way to create a GRU Layer with multiple hidden layers, and that they have to be stacked manually.