1
votes

I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected. Here is the kernel code :

__global__ void dev_test(int *test) {
*test = 100;
}

I invoke the kernel code as below :

    int *dev_int, val;
val = 0;
cudaMalloc((void **)&dev_int, sizeof(int));
cudaMemset((void *)dev_int, 0, sizeof(int));
cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
dev_test <<< 1, 1>>> (dev_int);
int *host_int = (int*)malloc(sizeof(int));
cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);
printf("copied back from device %d\n",*host_int);

When compiled with -arch=sm_11, the print statement correctly prints 100. However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.

CUDA version - 3.0 NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010 GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)

Any help is highly appreciated.

1
dev_int is not a dangling pointer. i have assigned memory to it using cudaMalloc. I have simplified the code to illustrate my problem. i basically want to use sm_12 in order to use atomicCAS on a shared variable.vinodhrajagopal
My problem is that while using sm_12, any writes happening inside the kernel is not visible on the host.vinodhrajagopal
It's not unusual to pass pointers to kernel functions. The only unusual thing here is that his pointer points to a 1-element array. I have tried this code on a CUDA 4.1 system with a Tesla M2090 and it works correctly no matter what -arch I specify. vinodh can you upgrade to CUDA 4.1?harrism
@Paul. I don't quite get you. I have initialized the pointer here - cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice); I have passed pointers without any issues in all my earlier programs. The fact that it works with arch=sm_11 indicates it is not an issue with the pointer but something to do with the device's compute capability.vinodhrajagopal
@harrism Thanks for the suggestion... i did install 4.1.. But when i tried to compile, i got errors saying one of the shared lib was not found (i think it was libcudart4.so.. i am away from my machine right now and hence don't know the exact name of the lib)vinodhrajagopal

1 Answers

1
votes

My problem finally got resolved.. Not sure which one truly resolved it - i upgraded to Cuda 4.1 and upgraded my nVidia driver and the combination of the two solved the problem.