2
votes

Is it possible to see the address of Global Memory accessed by a thread during run time?

I know it will create a lot of overhead, but I would like to see what elements are being accessed. I think it will help me understand how the coalescing mechanism is implemented.

Thank you.

1

1 Answers

1
votes

CUDA thread code largely follows C and C++ syntax patterns. So you can easily print out the numerical value of a pointer in kernel code:

printf("pval = %p\n", my_pointer);

If you wanted to do this across threads in a CUDA kernel, you could do:

__global__ void my_kernel(int *data){
  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  printf("thread: %d, pointer: %p, value: %d\n", idx, &(data[idx]), data[idx]);
}

or similar. Obviously this will create large amounts of output if you use large numbers of threads, and be aware that in-kernel printf uses a buffer of limited size.