8
votes

This is the API I am looking at, https://pytorch.org/docs/stable/nn.html#gru

It outputs:

  1. output of shape (seq_len, batch, num_directions * hidden_size)
  2. h_n of shape (num_layers * num_directions, batch, hidden_size)

For GRU with more than one layers, I wonder how to fetch the hidden state of the last layer, should it be h_n[0] or h_n[-1]?

What if it's bidirectional, how to do the slicing to obtain the last hidden layer states of GRUs in both directions?

1
I think it's h_n[-1]. Just confirmed myselfzyxue

1 Answers

2
votes

The documentation nn.GRU is clear about this. Here is an example to make it more explicit:

For the unidirectional GRU/LSTM (with more than one hidden layer):

output - would contain all the output features of all the timesteps t
h_n - would return the hidden state (at last timestep) of all layers.

To get the hidden state of the last hidden layer and last timestep, use:

first_hidden_layer_last_timestep = h_n[0]
last_hidden_layer_last_timestep = h_n[-1]

where n is the sequence length.


This is because description says:

num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results.

So, it is natural and intuitive to also return the results (i.e. hidden states) accordingly in the same order.