I'm trying to run some Tensorflow code, and I get what seems to be a common problem:
$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python -c "import tensorflow; tensorflow.Session()"
2019-02-06 20:36:15.903204: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-06 20:36:15.908809: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-02-06 20:36:15.908858: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: tigris
2019-02-06 20:36:15.908868: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: tigris
2019-02-06 20:36:15.908942: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 390.77.0
2019-02-06 20:36:15.908985: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 390.30.0
2019-02-06 20:36:15.909006: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version 390.30.0 does not match DSO version 390.77.0 -- cannot find working devices in this configuration
$
The key pieces of that error message seem to be:
[...] libcuda reported version is: 390.77.0
[...] kernel reported version is: 390.30.0
[...] kernel version 390.30.0 does not match DSO version 390.77.0 -- cannot find working devices in this configuration
How can I install compatible versions? Where is that libcuda version coming from?
Background
A few months ago, I tried installing Tensorflow with GPU support, but the versions either broke my display or wouldn't work with Tensorflow. Finally, I got it working by following a tutorial on how to install multiple versions of the CUDA libraries on the same machine. That worked at the time, but when I came back to the project after a few months, it has stopped working. I assume that some driver got upgraded during that time.
Investigation
The first thing I tried was to see what versions I have of the nvidia drivers and libcuda package.
$ dpkg --list|grep libcuda
ii libcuda1-390 390.30-0ubuntu1 amd64 NVIDIA CUDA runtime library
Looks like it's 390.30. Why does the error message say that libcuda reported 390.77?
$ dpkg --list|grep nvidia
ii libnvidia-container-tools 1.0.1-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.0.1-1 amd64 NVIDIA container runtime library
rc nvidia-384 384.130-0ubuntu0.16.04.1 amd64 NVIDIA binary driver - version 384.130
ii nvidia-390 390.30-0ubuntu1 amd64 NVIDIA binary driver - version 390.30
ii nvidia-390-dev 390.30-0ubuntu1 amd64 NVIDIA binary Xorg driver development files
rc nvidia-396 396.44-0ubuntu1 amd64 NVIDIA binary driver - version 396.44
ii nvidia-container-runtime 2.0.0+docker18.09.1-1 amd64 NVIDIA container runtime
ii nvidia-container-runtime-hook 1.4.0-1 amd64 NVIDIA container runtime hook
ii nvidia-docker2 2.0.3+docker18.09.1-1 all nvidia-docker CLI wrapper
ii nvidia-modprobe 390.30-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
rc nvidia-opencl-icd-384 384.130-0ubuntu0.16.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-opencl-icd-390 390.30-0ubuntu1 amd64 NVIDIA OpenCL ICD
rc nvidia-opencl-icd-396 396.44-0ubuntu1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.8.2 all Tools to enable NVIDIA's Prime
ii nvidia-settings 396.44-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
Again, everything looks like it's 390.30. There were some packages that had version 390.77, but they were in the rc status. I guess I installed that version and later removed it, so the configuration files were left behind. I purged the configuration files with commands like this:
sudo apt-get remove --purge nvidia-kernel-common-390
Now, there are no packages at all with version 390.77.
$ dpkg --list|grep 390.77
$
I tried reinstalling CUDA, to see if it had been compiled with the wrong version.
$ sudo sh cuda_9.0.176_384.81_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-9.0 --override
That didn't make any difference.
Finally, I tried running nvidia-smi.
$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
$
All of this is running on Ubuntu 18.04 with Python 3.6.7, and my graphics card is NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2).