0
votes

developers,

may someone give me a hint please? I didn't find any information about how to allocate constant and dynamic shared memory in the same kernel, or lets ask more preciously: How to call a kernel where the amount of shared memory that needs to allocated is just partly known at compilation time? Referring to allocating shared memory for example, it becomes pretty obvious how to do for dynamic allocation. But lets assume I have the following kernel:

__global__ void MyKernel(int Float4ArrSize, int FloatArrSize)
{
  __shared__ float Arr1[256];
  __shared__ char  Arr2[256];
  extern __shared_ float DynamArr[];
  float4* DynamArr1 = (float4*) DynamArr;
  float* DynamArr = (float*) &DynamArr1[Float4ArrSize];

  // do something
}

Kernel Call:

int SharedMemorySize = Float4ArrSize + FloatArrSize;

SubstractKernel<<< numBlocks, threadsPerBlock, SharedMemorySize, stream>>>(Float4ArrSize, FloatArrSize)

I'm actually wasn't able to figure out how the compiler is linking the size a shared memory only to the part I want to allocate dynamically. Or does the parameter "SharedMemeorySize" represents the total amount of shared memory per block, so I need to calculate in the size of constant memory (int SharedMemorySize = Float4ArrSize + FloatArrSize + 256*sizeof(float)+ 256*sizeof(char)) ?

Please enlighten me or just simply point to some code snippets. Thanks a lot in advance.

cheers greg

2
OP! It only expects a single __shared__ variable. Try to combine everything into a single struct. - Soroosh Bateni
@SorooshBateni no it doesn't! the parameter in the kernel call affects only the size of the dynamically allocated shared memory, the statically allocated parts are unaffected and are fine the way they are. - RoBiK
Yes I said the dynamic allocating unit (:D) expects only one __shared__ variable. - Soroosh Bateni

2 Answers

3
votes

Citing programing guide, SharedMemorySize specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array. SharedMemorySize is an optional argument which defaults to 0.

So if I understand what you want to do, it should probably look like

extern __shared_ float DynamArr[];
float*  DynamArr1 = DynamArr;
float4* DynamArr2 = (float4*) &DynamArr[DynamArr1_size];

Be aware, I didn't test it.

Here is very useful post.

1
votes

From the CUDA programming guide:

The [kernel's] execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where:

  • Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __shared__; Ns is an optional argument which defaults to 0;

So basically, the shared memory size that you specify during the kernel call is related to the dinamically allocated shared memory. You don't have to manually add the size of your statically allocated arrays in shared memory.