I was training a neural network model on GPU but I get the above mentioned error when I use
torch.save()to save checkpoints. My question is even though I have a CUDA device why am I getting the mentioned error? My model was running okay on the GPU please see bellow the Output of: nvidia-smi command.
$ nvidia-smi Sat Aug 15 09:51:58 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2060 Off | 00000000:01:00.0 Off | N/A | | N/A 55C P3 33W / N/A | 4774MiB / 5934MiB | 97% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 7080 C python3 4763MiB | +-----------------------------------------------------------------------------+ $ python --version Python 3.8.2 $ python -c "import torch; print(torch.__version__)" 1.5.1 $ python -c "import torchvision as torch; print(torch.__version__)" 0.6.1
I have even tried the following:
os.environ["CUDA_VISIBLE_DEVICES"] = '0' torch.save({ 'epoch': epoch + 1, 'metrics': metrics, 'model': model.state_dict(), 'optimizer' : optimizer.state_dict(), }, name)
But nothing worked. I am new to deep learning and still learning PyTorch. Please do pardon my ignorance.