CUDA 3.0 version compatibility with compiler option -arch=sm_12

Question

I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected. Here is the kernel code :

__global__ void dev_test(int *test) {
*test = 100;
}

I invoke the kernel code as below :

    int *dev_int, val;
val = 0;
cudaMalloc((void **)&dev_int, sizeof(int));
cudaMemset((void *)dev_int, 0, sizeof(int));
cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
dev_test <<< 1, 1>>> (dev_int);
int *host_int = (int*)malloc(sizeof(int));
cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);
printf("copied back from device %d\n",*host_int);

When compiled with -arch=sm_11, the print statement correctly prints 100. However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.

CUDA version - 3.0 NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010 GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)

Any help is highly appreciated.

dev_int is not a dangling pointer. i have assigned memory to it using cudaMalloc. I have simplified the code to illustrate my problem. i basically want to use sm_12 in order to use atomicCAS on a shared variable. — vinodhrajagopal
My problem is that while using sm_12, any writes happening inside the kernel is not visible on the host. — vinodhrajagopal
It's not unusual to pass pointers to kernel functions. The only unusual thing here is that his pointer points to a 1-element array. I have tried this code on a CUDA 4.1 system with a Tesla M2090 and it works correctly no matter what -arch I specify. vinodh can you upgrade to CUDA 4.1? — harrism
@Paul. I don't quite get you. I have initialized the pointer here - cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice); I have passed pointers without any issues in all my earlier programs. The fact that it works with arch=sm_11 indicates it is not an issue with the pointer but something to do with the device's compute capability. — vinodhrajagopal
@harrism Thanks for the suggestion... i did install 4.1.. But when i tried to compile, i got errors saying one of the shared lib was not found (i think it was libcudart4.so.. i am away from my machine right now and hence don't know the exact name of the lib) — vinodhrajagopal

vinodhrajagopal vinodhrajagopal · Accepted Answer · 2012-03-29T21:01:23

My problem finally got resolved.. Not sure which one truly resolved it - i upgraded to Cuda 4.1 and upgraded my nVidia driver and the combination of the two solved the problem.

CUDA 3.0 version compatibility with compiler option -arch=sm_12

1 Answers