NVIDIA NVML Driver/library version mismatch [closed]

369

votes

When I run nvidia-smi I get the following message:

Failed to initialize NVML: Driver/library version mismatch

An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi, getting the following result:

After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from the official NVIDIA page and then simply:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

Now I have cuda installed, but I get the mentioned mismatch error.

Some potentially useful information:

Running cat /proc/driver/nvidia/version I get:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  378.13  Tue Feb  7 20:10:06 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

I'm running Ubuntu 16.04.2 LTS.

Kernel release is: 4.4.0-66-generic.

Thanks!

cudadrivergpunvidia

You have probably mixed a previous runfile install with your (current) package manager install (apt-get). Follow the instructions in the cuda linux install guide to remove all previous NVIDIA driver and CUDA files, and then reinstall after you have cleaned that up. Before starting your reinstall, you may want to read the entire linux install guide doc I linked. The conflict almost certainly arises out of your attempt to install the CUDA 8 GA2 package on top of your existing 378.13 driver install. - Robert Crovella

@talonmies Where would be a good place to ask GPU related questions, if not on Stackoverflow? - bug_spray

I am using Ubuntu and I think error occurs after Nvidia driver is updated on Linux. Maybe auto-remove and reboot is required after updating Nvidia driver. - lechat

This is an important question that is useful to anyone who develops deep learning models. It shows up as the first result on Google and must be allowed to have newer/better answers float to the top. It should remain open. - Justas

@Justas the join me in reopening the question and vote. I agree this is an essential question. - Charlie Parker

542

votes

Surprise surprise, rebooting solved the issue (I thought I had already tried that).

The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.

348

votes

As @etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.

For Chinese, check my blog -> 中文版

The error message

NVML: Driver/library version mismatch

tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod

How to do that ?

First, we should know which drivers are loaded.

lsmod | grep nvidia

you may get

nvidia_uvm            634880  8
nvidia_drm             53248  0
nvidia_modeset        790528  1 nvidia_drm
nvidia              12312576  86 nvidia_modeset,nvidia_uvm

our final goal is to unload nvidia mod, so we should unload the module depend on nvidia

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm

then, unload nvidia

sudo rmmod nvidia

Troubleshooting

if you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:

sudo lsof /dev/nvidia*

and then kill those process, then continue to unload the kmods

Test

confirm you successfully unload those kmods

lsmod | grep nvidia

you should get nothing, then confirm you can load the correct driver

nvidia-smi

you should get the correct output

26

votes

So I was having this problem, none of the other remedies worked. The error message was opaque, but checking dmesg was key:

[   10.118255] NVRM: API mismatch: the client has the version 410.79, but
           NVRM: this kernel module has the version 384.130.  Please
           NVRM: make sure that this kernel module and all NVIDIA driver
           NVRM: components have the same version.

However I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but was only finding 410. So I recompiled my kernel:

# uname -a # find the kernel it's using
Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# update-initramfs -c -k 4.13.0-43-generic #recompile it
# reboot

And then it worked.

After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers

I recommend using the locate command (not installed by default) rather than searching the filesystem every time.

19

votes

The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem. The below error info may cause by installing two different versions of the driver by different approaches. For example, install Nvidia driver by the apt and the official installer.

Failed to initialize NVML: Driver/library version mismatch

To solve this problem, only need to execute one of the following two commands.

sudo apt-get --purge remove "*nvidia*"

sudo /usr/bin/nvidia-uninstall

12

votes

For those who really want to know why the version mismatch happened and how to prevent it from happening again. This is because of the versions of nvidia-* are different in these locations:

dpkg -l | grep nvidia (look at nvidia-utils-xxx package version), and
cat /proc/driver/nvidia/version (look at the version of Kernel Module, 460.56 - for example)

The restart should work, but you may want to forbid the automatic update of this package by modifying /etc/apt/sources.list.d/ files OR (I just found an easier way to hold the package) by executing this command apt-mark hold nvidia-utils-version_number.

Cheers.

P/S: Some contents were inspired by this (the original instruction was in Chinese, so i referenced the translated version instead)

11

votes

I got the error failed to initialize NVML: Driver/Library version mismatch from my nvidia-gpu-temperature-indicator. And nvidia-smi failed to print any info. I tried to find if there were other versions of nvidia driver installed in my ubuntu. But I just found nvidia-driver-390. In the end, reboot helped me solve the problem.

11

votes

sudo reboot

Rebooting solved it for me on Ubuntu 18.04 with two NVIDIA GeForce GTX 1080 Ti.

8

votes

Had the issue too. (I'm running ubuntu 18.04)

What I did:

dpkg -l | grep -i nvidia

Then sudo apt-get remove --purge nvidia-381 (and every duplicate version, in my case I had 381, 384 and 387)

Then sudo ubuntu-drivers devices to list what's available

And I choose sudo apt install nvidia-driver-430

After that, nvidia-smi gave the correct output (no need to reboot). But I suppose you can reboot when in doubt.

I also followed this installation to reinstall cuda+cudnn.

7

votes

reboot. If the problem still exist:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
nvidia-smi

for cent/rhel

cd /boot
mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -vf initramfs-$(uname -r).img $(uname -r)

then

reboot

for debian/ubuntu

update-initramfs -u

if problem exist persist

apt install -y dkms && dkms install -m nvidia -v 440.82

Change 440.82 to your actual version.

tip: get the Nvidia driver version:

ls /usr/src

you will find the Nvidia driver dir such as nvidia-440.82

also you can remove all Nvidia pkg and reinstall driver again

apt purge nvidia*
apt purge *cuda*

#check
apt list -i |grep nvidia
apt list -i |grep cuda

6

votes

If you've recently updated, a reboot might solve this problem.

5

votes

This also happened to me on Ubuntu 16.04 using the nvidia-348 package (latest nvidia version on Ubuntu 16.04).

However I could resolve the problem by installing nvidia-390 through the Proprietary GPU Drivers PPA.

So a solution to the described problem on Ubuntu 16.04 is doing this:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-390

Note: This guide assumes a clean Ubuntu install. If you have previous drivers installed a reboot migh be needed to reload all the kernel modules.

3

votes

Mostly reboot would fix the issue on Ubuntu 18.04.

The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will generally resolve this issue.

2

votes

These answers not worked for me:

https://stackoverflow.com/a/43023000/1179925

https://stackoverflow.com/a/45319156/1179925

https://stackoverflow.com/a/54349675/1179925

dmesg

NVRM: API mismatch: the client has the version 418.67, but
NVRM: this kernel module has the version 430.26.  Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

Uninstall old driver 418.67 and install new driver 430.26 (download NVIDIA-Linux-x86_64-430.26.run):

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
chmod +x NVIDIA-Linux-x86_64-430.26.run
sudo ./NVIDIA-Linux-x86_64-430.26.run
[ignore abort]

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.26  Tue Jun  4 17:40:52 CDT 2019
GCC version:  gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

1

votes

I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and nVidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:

sudo yum remove nvidia-driver-*
sudo reboot
sudo yum install nvidia-driver-cuda nvidia-modprobe
sudo modprobe nvidia # or just reboot

It made sure my kernel and my nVidia driver are consistent. I reckon that just rebooting may result in wrong version of kernel module being loaded.

1

votes

sudo reboot

Rebooting solved it for me on Ubuntu 18.04 with two NVIDIA GeForce GTX 1080 Ti.

0

votes

I had reinstalled nvidia driver: run these commands in root mode:

systemctl isolate multi-user.target
modprobe -r nvidia-drm
Reinstall Nvidia driver: chmod +x NVIDIA-Linux-x86_64–410.57.run
systemctl start graphical.target

and finally check nvidia-smi

Thanks to: How To Install Nvidia Drivers and CUDA-10.0 for RTX 2080 Ti GPU on Ubuntu-16.04/18.04

How to unload kernel module 'nvidia-drm'?

0

votes

I committed the container into a docker image. Then I recreate another container using this docker image and the problem was gone.

0

votes

I have to restart my kernels and remove all the packages that I have installed previously(during the first installation), please make sure to delete all the packages, even after removing packages by command below

sudo apt-get --purge remove "nvidia"

the packages like "libtinfo6:i386" doesn't get removed

I'm using Ubuntu 20.04 and Nvidia-driver-440 for that you have to remove all the packages shown below image

List of all the packages that need to be remove:

as shown in the image, make sure that the package you're installing is of the correct size that is 207 Mb for Nvidia-driver-440, if it's less it means you haven't removed all the packages.

0

votes

For completeness, I ran into this issue as well. In my case it turned out that because I had set Clang as my default compiler (using update-alternatives), nvidia-driver-440 failed to compile (check /var/crash/) even though apt didn't post any warnings. For me, the solution was to apt purge nvidia-*, set cc back to use gcc, reboot, and reinstall nvidia-driver-440.

0

votes

First I installed the Nvidia driver.

Next I installed cuda.

Ater that I got the "Driver/library version mismatch" ERROR but I could see the cuda version so I purged the Nvidia driver and reinstall it.

Then it worked correctly.

0

votes

There is an easier solution that worked for me. On Fedora 33, try the following:

rpm -qa | grep -i nvidia | grep f32

You should have two packages listed from the previous version of Fedora for opengl. Remove those and reboot.

Deleting and reinstalling the entire nvidia package set is overkill.

0

votes

It doesn't work for me by rebooting or unloading driver. I solved the problem by updating my nvidia driver 440.33.01 to 450.80.2.

sudo apt-get install nvidia-driver-450

sudo reboot

I'm running Ubuntu 20.04 LTS, which a remote server

-2

votes

I was facing same problem and I'm posting here my solution.

In my case NVRM version was 440.100 and driver version was 460.32.03. My driver was updated by sudo apt install caffe-cuda and I didn't notice that time but I checked it from /var/log/apt/history.log. By following my NVRM version I just used sudo apt install nvidia-driver-440 but it installed 450.102, I don't know why it installed other version and nvidia-smi is showing 450.102.04.

Anyhow after rebooting my PC everything is working fine now. After reinstalling driver still my cuda is working fine.

I didn't remove/purge anything related to nvidia driver. Version 460.32.03 was uninstalled automatically by running sudo apt install nvidia-driver-440

NVIDIA NVML Driver/library version mismatch [closed]

23 Answers

How to do that ?

Troubleshooting

Test