Google Cloud DL Container deployed to GCE with GPU can't find CUDA Device

Question

I use Pytorch image for GPUs: gcr.io/deeplearning-platform-release/pytorch-gpu.1-2:latest. I deploy it to GCE with K80 and V100 GPUs.

import torch
torch.cuda.device_count()
#returns 0

Cuda is installed. When I ssh into docker container and run following command on terminal, I can see it.

cat /usr/local/cuda/version.txt
CUDA Version 10.0.130

FYI, nvidia-smi command from terminal does not work. What am I doing wrong? Or is there a problem with docker images?

Ahmad P Ahmad P · Accepted Answer · 2019-11-15T18:28:03

It seems that the NVIDIA driver has not been installed correctly. Please note that “each version of CUDA requires a minimum GPU driver version or a later version.” To check the minimum driver required for your version of CUDA, see this link: Toolkit and Compatible Driver Versions.