I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected. Here is the kernel code :
__global__ void dev_test(int *test) {
*test = 100;
}
I invoke the kernel code as below :
int *dev_int, val;
val = 0;
cudaMalloc((void **)&dev_int, sizeof(int));
cudaMemset((void *)dev_int, 0, sizeof(int));
cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
dev_test <<< 1, 1>>> (dev_int);
int *host_int = (int*)malloc(sizeof(int));
cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);
printf("copied back from device %d\n",*host_int);
When compiled with -arch=sm_11, the print statement correctly prints 100. However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.
CUDA version - 3.0 NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010 GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Any help is highly appreciated.
-arch
I specify. vinodh can you upgrade to CUDA 4.1? – harrismcudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
I have passed pointers without any issues in all my earlier programs. The fact that it works with arch=sm_11 indicates it is not an issue with the pointer but something to do with the device's compute capability. – vinodhrajagopal