Simple console program will not exit if cudaMalloc is called

Question

The following simple program never exits if the cudaMalloc call is executed. Commenting out just the cudaMalloc causes it to exit normally.

#include <iostream>
using std::cout;
using std::cin;

#include "cuda.h"
#include "cutil_inline.h"

void PrintCudaVersion(int version, const char *name)
{
    int versionMaj = version / 1000;
    int versionMin = (version - (versionMaj * 1000)) / 10;
    cout << "CUDA " << name << " version: " << versionMaj << "." << versionMin << "\n";
}

void ReportCudaVersions()
{
    int version = 0;
    cudaDriverGetVersion(&version);
    PrintCudaVersion(version, "Driver");

    cudaRuntimeGetVersion(&version);
    PrintCudaVersion(version, "Runtime");
}

int main(int argc, char **argv)
{
    //CUresult r = cuInit(0);                 << These two lines were in original post
    //cout << "Init result: " << r << "\n";   << but have no effect on the problem

    ReportCudaVersions();

    void *ptr = NULL;
    cudaError_t err = cudaSuccess;
    err = cudaMalloc(&ptr, 1024*1024);
    cout << "cudaMalloc returned: " << err << "  ptr: " << ptr << "\n";
    err = cudaFree(ptr);
    cout << "cudaFree returned: " << err << "\n";

    return(0);
 }

This is running on Windows 7, CUDA 4.1 driver, CUDA 3.2 runtime. I've trace the return from main through the CRT to ExitProcess(), from which it never returns (as expected) but the process never ends either. From VS2008 I can stop debugging OK. From the command line, I must kill the console window.

Program output:

Init result: 0
CUDA Driver version: 4.1
CUDA Runtime version: 3.2
cudaMalloc returned: 0  ptr: 00210000
cudaFree returned: 0

I tried making the allocation amount so large that cudaMalloc would fail. It did and reported an error, but the program still would not exit. So it apparently has to do with merely calling cudaMalloc, not the existence of allocated memory.

Any ideas as to what is going on here?

EDIT: I was wrong in the second sentence - I have to eliminate both the cudaMalloc and the cudaFree to get the program to exit. Leaving either one in causes the hang up.

EDIT: Although there are many references to the fact that CUDA driver versions are backward compatible, this problem went away when I reverted the driver to V3.2.

You are supposed to match the driver and runtime version as far as I know. — jmsu
AFAIK, the driver is generally backward compatible. We have a significant amount of more complex code that works fine. — Steve Fallows
That's why I didn't put it as an answer... I am not sure about it. I would say the problem was that if the cudaMalloc fails you can't use cudaFree but you state that leaving either one in causes the hang up. I would remove the cudaFree either way or execute it conditionally. Maybe try a cudaDeviceReset() before return? — jmsu

Vlad Vlad · Accepted Answer · 2011-12-15T22:25:50

It seems like you're mixing the driver API (cuInit) with the runtime API (cudaMalloc).

I don't know if anything funny happens (or should happen) behind the scenes, but one thing you could try is to remove the cuInit and see what happens.

Simple console program will not exit if cudaMalloc is called

1 Answers