I have a CUDA card with : Cuda Compute capability (3.5) If i have a call such as <<<2000,512>>> , what are the number of iterations that occur within the kernel? I thought it was (2000*512), but testing isn't proving this? I also want to confirm that the way I'm calculating the the variable is correct.
The situation is, within the kernel I am incrementing a passed global memory number based on the thread number :
int thr = blockDim.x * blockIdx.x + threadIdx.x;
worknumber = globalnumber + thr;
So, when I return back to the CPU, I want to know exactly how many increments there were so I can keep track so I don't repeat or skip numbers when I recall the kernel GPU to process my next set of numbers.
Edit :
__global__ void allin(uint64_t *lkey, const unsigned char *d_patfile)
{
uint64_t kkey;
int tmp;
int thr = blockDim.x * blockIdx.x + threadIdx.x;
kkey = *lkey + thr;
if (thr > tmp) {
tmp = thr;
printf("%u \n", thr);
}
}
tmp
to any value before testing it in theif
statement. I would think the compiler would be throwing a warning about that. The number of threads or "iterations" created by<<<2000,512>>>
is indeed 2000*512.printf
from a cuda kernel has various limitations, so using it to validate that a large number of threads were launched probably won't work. – Robert Crovella__device__
global variable, initialize it to zero, then have each thread doatomicAdd(&var, 1);
After that, copy the variable back to host code and print it out. – Robert Crovella