3
votes

I created a Google VM instance using this available image:

c1-deeplearning-common-cu100-20191226

Description

Google, Deep Learning Image: Base, m39 (with CUDA 10.0), A Debian based image with CUDA 10.0

I then installed Anaconda onto this VM, then installed Pytorch using the following command line as recommended by the Pytorch website:

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

(this corresponds to Linux, Python 3.7, CUDA 10.1)

From Python, I ran this code to check the GPU detection:

import torch
torch.cuda.is_available()
False

From the nvidia-smi tool, this is the result even after the main body of code is running the training:

(base) redexces.bf@tensorflow-1x-2x:~$ nvidia-smi
Thu Jan  2 01:33:10 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Clearly, there are no running processes nor any memory allocated.

This problem appears to be related to Pytorch only; the same VM also has Tensorflow-gpu installed in a separate conda environment which recognizes the GPU and utilizes it as I would expect.

Am I missing any pieces? Again the same CUDA driver and image are working fine for tensorflow.

1
Try cudatoolkit=10.0 during installationSzymon Maszke

1 Answers

2
votes

I was able to resolve the issue. Not being a computer science guy, I figured that it could be an nvidia driver compatibility issue. Since Pytorch was built using CUDA 10.1 driver, and the deep learning image had CUDA 10.0 installed, I created another VM instance but this time instead of using the public image noted earlier, I used the gcloud command line to specify deep learning with cu10.1 driver. This made it all work as expected.