3
votes

As in title, in cuda programs, where does the kernel parameter resides after kernel launch, in local memory or global memory of GPU?

For example, in LLVM IR of a cuda program:

__global__ kernel(int param1):

%0 = alloca int

store param1, %0

So, in this case, where does %0 point to? local memory or global memory?

Also, I saw sometimes kernel parameters are held and use directly in registers instead of storing it in any memory. How this decision is made?

1
On all devices supported by CUDA 7 or 7.5, it resides in constant memory. If you dump the generated machine code (SASS) you'll be able to observe this. - Robert Crovella
@RobertCrovella Thanks! What about CUDA 6? How can I see which memory it goes in SASS? Is it particular qualifier marked in SASS for this? - cache

1 Answers

4
votes

As Robert Corvella in his comment pointed out: parameters are stored in contant memory of the GPU.

However, doing an alloca and a store of param1 to the allocated space moves copies the parameter from the constant memory to local memory. alloca instructions are lowerd to stack allocations in PTX code. In clang, this is the canonical way to handle function parameters during code generation. However, on GPUs this can (since PTX is optimized during lowering to SASS just saying: can) lead to a performance penalty because local memory goes through all cache levels down to global memory and is much slower than constant memory.

In LLVM you have the mem2reg optimizer pass. This pass promotes all memory allocations on the stack to registers. In the case of kernel parameters you most likely want this optimization. The alloca and the store instructions disappear form your IR and the parameter will be used directly instead of an unnecessary copy.