2
votes

I'm getting started with pytorch and used a few transformation to build the following model using one of the tutorials as a reference:

model = torch.nn.Sequential( 
     torch.nn.Linear(D_in, H),
     torch.nn.ReLU(),
     torch.nn.Linear(H, D_out),
)

I want to use an LSTM network, so I tried to do the following:

model = torch.nn.Sequential(
      torch.nn.LSTM(D_in, H),
      torch.nn.Linear(H, D_out) 
)

which gives me this error:

RuntimeError: input must have 3 dimensions, got 2

Why am I seeing this error? I anticipate there's something fundamentally wrong in my understanding of how transformations (networks?) can be chained in pytorch...

EDIT

After following @esBee's suggestion, I found that following runs correctly. This is because an LSTM expects the input to be of the folowing dimension:

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence

local_x = local_x.unsqueeze(0)
y_pred, (hn, cn) = layerA(local_x)
y_pred = y_pred.squeeze(0)
y_pred = layerB(y_pred)

However, the fact that my original training/test dataset is only of sequence length 1 makes me feel like I'm doing something incorrectly. What is the purpose of this parameter in the context of neural networks?

2

2 Answers

1
votes

The error message is telling you that the input needs three dimensions.

Looking at the pytorch documentation, the example they provide is this:

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3

Either D_in or H does not have three dimensions.

1
votes

The thing you have to pay attention to here, is that, as opposed to linear layers such as torch.nn.Linear, there's more than 1 output for recurring layers such as torch.nn.LSTM.

While torch.nn.Linear returns simply the y in y = Ax + b, torch.nn.LSTMs return output, (h_n, c_n) (explained in more detail in the docs) to let you choose which output you want to process. So what happens in your example is you are feeding all these several types of output into the layer after your LSTM layer (leading to the error you are seeing). You should instead choose a specific part of the output of your LSTM and feed only that into the next layer.

Sadly I don't know how to choose the output of an LSTM within a Sequential (suggestions welcome), but you can rewrite

model = torch.nn.Sequential(
    torch.nn.LSTM(D_in, H),
    torch.nn.Linear(H, D_out) 
)

model(x)

as

layerA = torch.nn.LSTM(D_in, H)
layerB = torch.nn.Linear(H, D_out)

x = layerA(x)
x = layerB(x)

and then correct it by choosing the output features (h_n) of the last layer of your LSTM by writing

layerA = torch.nn.LSTM(D_in, H)
layerB = torch.nn.Linear(H, D_out)

x = layerA(x)[0]
x = layerB(x)