How can I flush GPU memory using CUDA (physical reset is unavailable)

Question

My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.

I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.

Placing cudaDeviceReset() in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.

I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.

So, the question is - Is there any way to flush the device memory in this situation?

Although nvidia-smi --gpu-reset is not available, I can still get some information with nvidia-smi -q. In most fields it gives 'N/A', but some information is useful. Here is the relevant output: Memory Usage Total : 1535 MB Used : 1227 MB Free : 307 MB — timdim
If you have root access, you can unload and reload the nvidia driver. — tera
If you do ps -ef |grep 'whoami' and the results show any processes that appear to be related to your crashed session, kill those. (the single quote ' should be replaced with backtick ` ) — Robert Crovella
nvidia-smi -caa worked great for me to release memory on all GPUs at once. — David Arenburg

talonmies talonmies · Accepted Answer · 2014-05-18T12:18:14

Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

$ rmmod nvidia

with suitable root privileges and then reloading it with

$ modprobe nvidia

If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag

How can I flush GPU memory using CUDA (physical reset is unavailable)

7 Answers