0
votes

I'm looking at the code in question: How do I choose grid and block dimensions for CUDA kernels? Which is a followup question from: CUDA how to get grid, block, thread size and parallalize non square matrix calculation

const int n = 128 * 1024;
int blocksize = 512; // value usually chosen by tuning and hardware constraints
int nblocks = n / nthreads; // value determine by block size and total work
madd<<<nblocks,blocksize>>>mAdd(A,B,C,n);

What is the difference between blocksize and nthreads? I'm thinking they are one in the same. Is this just a typo or am I missing something?

1

1 Answers

0
votes

The number of blocks is going to be the number of instances divided by the size of each block. However this may result in a non integer answer. So you have to make sure you round up so that each instance gets executed at the expense of wasting some resources.

So what you really want to do is apply this little integer arithmetic trick:

int nblocks = (n+blocksize-1)/blocksize;