1
votes

Im trying to run docker with tensorflow using Nvidia GPUs, however when I run my container I get the following error:

pgp_1  | Traceback (most recent call last):
pgp_1  |   File "/opt/app-root/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
pgp_1  |     from tensorflow.python.pywrap_tensorflow_internal import *
pgp_1  |   File "/opt/app-root/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
pgp_1  |     _pywrap_tensorflow_internal = swig_import_helper()
pgp_1  |   File "/opt/app-root/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
pgp_1  |     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
pgp_1  |   File "/opt/app-root/lib64/python3.6/imp.py", line 243, in load_module
pgp_1  |     return load_dynamic(name, filename, file)
pgp_1  |   File "/opt/app-root/lib64/python3.6/imp.py", line 343, in load_dynamic
pgp_1  |     return _load(spec)
pgp_1  | ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Docker-compose

My docker compose file looks like:

version: '3'
services:
  pgp:
    devices:
    - /dev/nvidia0
    - /dev/nvidia1
    - /dev/nvidia2
    - /dev/nvidia3
    - /dev/nvidia4
    - /dev/nvidiactl
    - /dev/nvidia-uvm
    image: "myimg/pgp"
    ports:
     - "5000:5000"
    environment:
     - LD_LIBRARY_PATH=/opt/local/cuda/lib64/
     - GPU_DEVICE=4
     - NVIDIA_VISIBLE_DEVICES all
     - NVIDIA_DRIVER_CAPABILITIES compute,utility
    volumes:
     - ./train_package:/opt/app-root/src/train_package
     - /usr/local/cuda/lib64/:/opt/local/cuda/lib64/

As you can see, I have tried having a volume to map host cuda to the docker container but this didnt help.

I am able to successfully run nvidia-docker run --rm nvidia/cuda nvidia-smi

Versions

Cuda

cat /usr/local/cuda/version.txt shows CUDA Version 9.0.176

nvcc -V

nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176

nvidia-docker version

NVIDIA Docker: 2.0.3 Client: Version: 17.12.1-ce

API version: 1.35

Go version: go1.9.4

Git commit: 7390fc6 Built: Tue Feb 27 22:17:40 2018

OS/Arch: linux/amd64

Server: Engine: Version: 17.12.1-ce

API version: 1.35 (minimum version 1.12)

Go version: go1.9.4

Git commit: 7390fc6

Built: Tue Feb 27 22:16:13 2018

OS/Arch: linux/amd64

Experimental: false

Tensorflow

1.5 with gpu support, via pip

ldconfig -p | grep cuda
libnvrtc.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvrtc.so.9.0
libnvrtc.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvrtc.so
libnvrtc-builtins.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.9.0
libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvrtc-builtins.so
libnvgraph.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvgraph.so.9.0
libnvgraph.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvgraph.so
libnvblas.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvblas.so.9.0
libnvblas.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvblas.so
libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvToolsExt.so.1
libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvToolsExt.so
libnpps.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnpps.so.9.0
libnpps.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnpps.so
libnppitc.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppitc.so.9.0
libnppitc.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppitc.so
libnppisu.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppisu.so.9.0
libnppisu.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppisu.so
libnppist.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppist.so.9.0
libnppist.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppist.so
libnppim.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppim.so.9.0
libnppim.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppim.so
libnppig.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppig.so.9.0
libnppig.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppig.so
libnppif.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppif.so.9.0
libnppif.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppif.so
libnppidei.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppidei.so.9.0
libnppidei.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppidei.so
libnppicom.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppicom.so.9.0
libnppicom.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppicom.so
libnppicc.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppicc.so.9.0
libnppicc.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppicc.so
libnppial.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppial.so.9.0
libnppial.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppial.so
libnppc.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppc.so.9.0
libnppc.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnppc.so
libicudata.so.55 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libicudata.so.55
libcusparse.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusparse.so.9.0
libcusparse.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusparse.so
libcusolver.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusolver.so.9.0
libcusolver.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusolver.so
libcurand.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcurand.so.9.0
libcurand.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcurand.so
libcuinj64.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcuinj64.so.9.0
libcuinj64.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcuinj64.so
libcufftw.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcufftw.so.9.0
libcufftw.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcufftw.so
libcufft.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcufft.so.9.0
libcufft.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcufft.so
libcudart.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so.9.0
libcudart.so.7.5 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudart.so.7.5
libcudart.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so
libcudart.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudart.so
libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1
libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so
libcublas.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcublas.so.9.0
libcublas.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcublas.so
libaccinj64.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libaccinj64.so.9.0
libaccinj64.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libaccinj64.so
libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libOpenCL.so.1
libOpenCL.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libOpenCL.so

Tests with Tensorflow on Docker vs host

The following works, when running on the host:

python3 -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

v1.5.0-0-g37aa430d84 1.5.0

Run container

nvidia-docker run -d --name testtfgpu -p 8888:8888 -p 6006:6006 gcr.io/tensorflow/tensorflow:latest-gpu

Log in

nvidia-docker exec -it testtfgpu bash

Test Tensorflow version

pip show tensorflow-gpu shows:

pip show tensorflow-gpu Name: tensorflow-gpu Version: 1.6.0 Summary: TensorFlow helps the tensors flow Home-page: https://www.tensorflow.org/ Author: Google Inc. Author-email: [email protected] License: Apache 2.0 Location: /usr/local/lib/python2.7/dist-packages Requires: astor, protobuf, gast, tensorboard, six, wheel, absl-py, backports.weakref, termcolor, enum34, numpy, grpcio, mock

Python 2

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Results in:

Illegal instruction (core dumped)

Python 3

python3 -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Results in:

python3 -c "import tensorflow as tf; print(tf.GIT_ 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'tensorflow'
2
Did you try to install this way, pip3 install --upgrade tensorflow-gpu? - Dinusha Dilinka

2 Answers

1
votes

The problem because of your cuDNN version. Tensorflow-GPU 1.5 version will support cuDNN 7.0._ version. You can download that from here. Make sure that your CUDA version 9.0._ and cuDNN version 7.0._ . Please refer link in here for more details.

0
votes

It looks like a conflict between CUDA's version and TensorFlow's

First, try to check your CUDA version with one of the commands such as nvcc --version or cat /usr/local/cuda/version.txt

If that's 8.x, you may need to reinstall CUDA or simpler, downgrade TensorFlow to 1.4. Otherwise, if your CUDA is 9.x, you need TensorFlow 1.5 or newer.

Hope that helps.