I have a basic knowledge of parallel computing (including some CUDA), feedforward neural networks, and recurrent neural networks (and how they use BPTT).
When using for example TensorFlow you can apply GPU acceleration for the training phase of a network. But recurrent neural networks are sequential in nature, having timesteps where a current timestep depends on a previous, and the next timestep depends on the current, etc.
How come GPU acceleration works if it is like this? Is everything that can be computed in parallel computed in that way, while the timestep dependent parts are serialized?