How to fix this loss is NaN problem in PyTorch of this RNN with GRU?

Question

I'm completely new to PyTorch and tried out some models. I wanted to make an easy prediction rnn of stock market prices and found the following code:

I load the data set with pandas then split it into training and test data and load it into a pytorch DataLoader for later usage in training process. The model is defined in the GRU class. But the actual problem seems to be the optimisation. I think the problem could be gradient explosion. I thought about adding gradient clipping but the GRU design should actually prevent gradient explosion or am I wrong? What could cause the loss to be instantly NaN (already in the first epoch)

from sklearn.preprocessing import MinMaxScaler

import time
import pandas as pd
import numpy as np

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

batch_size = 200
input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
num_epochs = 10

nvda = pd.read_csv('dataset/stocks/NVDA.csv')
price = nvda[['Close']]
scaler = MinMaxScaler(feature_range=(-1, 1))
price['Close'] = scaler.fit_transform(price['Close'].values.reshape(-1, 1))

def split_data(stock, lookback):
    data_raw = stock.to_numpy()  # convert to numpy array
    data = []

    # create all possible sequences of length seq_len
    for index in range(len(data_raw) - lookback):
        data.append(data_raw[index: index + lookback])

    data = np.array(data)
    test_set_size = int(np.round(0.2 * data.shape[0]))
    train_set_size = data.shape[0] - (test_set_size)

    x_train = data[:train_set_size, :-1, :]
    y_train = data[:train_set_size, -1, :]

    x_test = data[train_set_size:, :-1]
    y_test = data[train_set_size:, -1, :]

    return [x_train, y_train, x_test, y_test]


lookback = 20  # choose sequence length
x_train, y_train, x_test, y_test = split_data(price, lookback)

train_data = TensorDataset(torch.from_numpy(x_train).float(), torch.from_numpy(y_train).float())
train_data = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)

test_data = TensorDataset(torch.from_numpy(x_test).float(), torch.from_numpy(y_test).float())
test_data = DataLoader(test_data, shuffle=True, batch_size=batch_size, drop_last=True)


class GRU(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super(GRU, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers

        self.gru = nn.GRU(input_dim, hidden_dim, num_layers, batch_first=True, dropout=0.2)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()

    def forward(self, x, h):

        out, h = self.gru(x, h)
        out = self.fc(self.relu(out[:, -1]))
        return out, h

    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = weight.new(self.num_layers, batch_size, self.hidden_dim).zero_()
        return hidden


model = GRU(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0000000001)
model.train()

start_time = time.time()

h = model.init_hidden(batch_size)
for epoch in range(1, num_epochs+1):
    for x, y in train_data:
        h = h.data
        model.zero_grad()
        y_train_pred, h = model(x, h)
        loss = criterion(y_train_pred, y)
        print("Epoch ", epoch, "MSE: ", loss.item())
        loss.backward()
        optimizer.step()


training_time = time.time() - start_time
print("Training time: {}".format(training_time))

This is the dataset which I used.

Hey! GRU doesn't prevent exploding gradient; it prevents vanishing gradient instead. So you can apply gradient clipping. You must train your scaler only with the training data and use it to transform test data. You don't have to apply relu activation to the layer that gives you the output (I think this is why you got nan loss); so remove it. Finally, may be your learning rate is too small; you should start with default param ie 1e-3. — Dimitri K. Sifoua
I used gard clipping and removed the relu layer but it still isn't working. The grad clip value I used was 1. — Hard_Veur

Maciek Woźniak Maciek Woźniak · Accepted Answer · 2020-09-21T00:25:49

Not sure if it is the case, but did you preprocess and cleaned the data? I do not know it but maybe there are some values missing or it's something strange about it. I checked it here https://ca.finance.yahoo.com/quote/NVDA/history?p=NVDA and it seems that every couple of rows there is some inconsistency. Like I said, I do not know if it's the case but it may be.

How to fix this loss is NaN problem in PyTorch of this RNN with GRU?

1 Answers