I am trying to train a LSTM layer in pytorch. I am using 4 GPUs. When initializing, I added the .cuda() function move the hidden layer to GPU. But when I run the code with multiple GPUs I am getting this runtime error :
RuntimeError: Input and hidden tensors are not at the same device
I have tried to solve the problem by using .cuda() function in the forward function like below :
self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda())
This line seems to solve the problem, but it raises my concern that if the updated hidden layer is seen in different GPUs. Should I move the vector back to cpu at the end of the forward function for a batch or is there any other way to solve the problem.
DataParallel
? – Jatentaki