Training recurrent neural network when using GPU acceleration by TensorFlow

Question

I have a basic knowledge of parallel computing (including some CUDA), feedforward neural networks, and recurrent neural networks (and how they use BPTT).

When using for example TensorFlow you can apply GPU acceleration for the training phase of a network. But recurrent neural networks are sequential in nature, having timesteps where a current timestep depends on a previous, and the next timestep depends on the current, etc.

How come GPU acceleration works if it is like this? Is everything that can be computed in parallel computed in that way, while the timestep dependent parts are serialized?

convolutionBoy convolutionBoy · Accepted Answer · 2017-04-27T12:46:56

RNNs train using backpropagation through time. The recurrent network structure is unfolded into a directed acyclic graph of finite length and looks just as a normal feedforward net would. It then trains using stochastic gradient descent where in between each time step there is a constraint that the weights must be equal.

If you understand that it trains like this, as in it is just constrained backpropagation on sequences of a given length, you see there is nothing about the sequential nature that is stopping this process from being parallelizable.

Training recurrent neural network when using GPU acceleration by TensorFlow

2 Answers