0
votes

Following an answer from SO, I have run:

# confirm TensorFlow sees the GPU
from tensorflow.python.client import device_lib
assert 'GPU' in str(device_lib.list_local_devices())

# confirm Keras sees the GPU
from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0

# confirm PyTorch sees the GPU
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
print(cuda.get_device_name(cuda.current_device()))

The first test is working, while the other ones do not.

Running nvcc --version gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

And nvidia-smi also work.

list_local_devices() provides:

[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 459307207819325532, name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 9054555249843627113 physical_device_desc: "device: XLA_GPU device", name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 5902450771458744885 physical_device_desc: "device: XLA_CPU device"]

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) returns:

Device mapping: /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

Why are Keras and PyTorch unable to run on my GPU? (RTX 2070)

2
what keras version is this?Paritosh Singh
actually it does not work either with tf tf.test.is_gpu_available() returns Falseguhur
@ParitoshSingh keras is 2.2.4guhur
oh ok, if it doesnt work with tensorflow either then you need to install tensorflow for gpu. it involves more steps than just a pip install.Paritosh Singh
what do you mean? I installed tensorflow-gpu with pipguhur

2 Answers

0
votes

I had a hard time to find the issue. Actually, running CUDA samples provided me great insights:

CUDA error at ../../common/inc/helper_cuda.h:1162 code=30(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)"

While with sudo: MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM GPU Device 0: "GeForce RTX 2070" with compute capability 7.5

So the issue was that my lib were not readable for everyone.

My bug was fixed with:

sudo chmod -R a+r /usr/local/cuda*

0
votes

I ran into this problem recently. It turned out that the pip installed requisite packages (such as keras) didn't include the XLA related flags. If I changed to a complete miniconda or anaconda install of the requisite packages I was then able to run my code. In my case I was running the facebook AI code.

An early indicator that there is a problem is running:

nvidia-smi

and seeing that your deepnet isn't using gigabits of data, but instead is using kilobytes. You then know even without the warnings (which are difficult to find in the logs, sometimes) that the problem lies in how the requisite software was compiled. You know this because the GPU is not getting a match on device type, and defaults to CPU as a result. The code is then offloaded onto the CPUs.

In my case, I installed tensorflow-gpu, ipython, imutils, imgaug and a few other packages using miniconda. If you find that a requisite package is missing from conda, use:

conda -c conda-forge <package-name>

to pick up missing items such as imutils and imgaug.