Torch allocates zero GPU memory on PyTorch

Question

I am trying to use GPU to train my model but it seems that torch fails to allocate GPU memory.

My model is a RNN built on PyTorch

device = torch.device('cuda: 0' if torch.cuda.is_available() else "cpu")

rnn = RNN(n_letters, n_hidden, n_categories_train)
rnn.to(device)
criterion = nn.NLLLoss()
criterion.to(device)
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate, weight_decay=.9)

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)

        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        input = input.cuda()
        hidden = hidden.cuda()

        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)

        output = output.cuda()
        hidden = hidden.cuda()

        return output, hidden

    def init_hidden(self):
        return Variable(torch.zeros(1, self.hidden_size).cuda())

Training function:

def train(category_tensor, line_tensor, rnn, optimizer, criterion):
    rnn.zero_grad()
    hidden = rnn.init_hidden()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    optimizer.step()

    return output, loss.item()

The function to get category_tensor and line_tensor:

def random_training_pair(category_lines, n_letters, all_letters):
    category = random.choice(all_categories_train)
    line = random.choice(category_lines[category])
    category_tensor = Variable(torch.LongTensor([all_categories_train.index(category)]).cuda())
    line_tensor = Variable(process_data.line_to_tensor(line, n_letters, all_letters)).cuda()

    return category, line, category_tensor, line_tensor

I ran the following the code:

 print(torch.cuda.get_device_name(0))
 print('Memory Usage:')
 print('Allocated:', round(torch.cuda.memory_allocated(0) / 1024 ** 3, 1), 'GB')
 print('Cached:   ', round(torch.cuda.memory_cached(0) / 1024 ** 3, 1), 'GB')

and I got:

GeForce GTX 1060
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB

I did not get any errors but GPU usage is just 1% while CPU usage is around 31%.

I am using Windows 10 and Anaconda, where my PyTorch is installed. CUDA and cuDNN is installed from .exe file downloaded from Nvidia website.

I don't see where in the code above you do anything else than print how much memory is allocated. It would seem obvious that there won't be a need to allocate memory for not doing anything!? — Michael Kenzel
I printed this out during the training process and no memory is allocated still — IDoNot1xist

MBT MBT · Accepted Answer · 2019-03-27T17:49:33

Your problem is that to() is not an in-place operation. If you call rnn.to(device) it will return a new object / model located on the desired device. But it will not move the old object anywhere!

So changing:

rnn = RNN(n_letters, n_hidden, n_categories_train)
rnn.to(device)

to:

rnn = RNN(n_letters, n_hidden, n_categories_train).to(device)

For all other instances you used to this way, you have to change it as well.

Should do the trick for you!

Note: All tensors and parameters you perform operations with have to be on the same device. If your model is on GPU but your input tensor is on CPU you will get an error message.

Torch allocates zero GPU memory on PyTorch

1 Answers