Nested loops in OpenCl Kernel

Question

I have recently started trying to study OpenCl and am trying to convert the following code into an efficient OpenCl kernel:

for(int i = 0; i < VECTOR_SIZE; i++)
{
    for(int j = 0; j < 100; j++)
    {
        C[i] = sqrt(A[i] + sqrt(A[i] * B[i])) * sqrt(A[i] + sqrt(A[i] * B[i]));
    }
}

This is what I have come up with so far using different tutorials. My question is, can I somehow get rid of the outer loop in my kernel. Would you say that this is an okey implementation of the above C++ code and no further thing can be done to make it more efficient or close to how an openCL program is supposed to be like.

Also, all the tutorials that I have read so far have the kernels written in a const char *. What is reason behind this and is this the only way OPenCL kernels are written or usually we code them in some other file and then include it in our regular code or something.

Thanks

     const char *RandomComputation =
"__kernel                                   \n"
"void RandomComputation(                              "
"                  __global float *A,       \n"
"                  __global float *B,       \n"
"                  __global float *C)       \n"
"{                                          \n"
"    //Get the index of the work-item       \n"
"    int index = get_global_id(0);          \n"
"   for (int j = 0; j < 100 ; j++)          \n"
"   {                                       \n"
"    C[index] = sqrt(A[index] + sqrt(A[index] * B[index])) * sqrt(A[index] + sqrt(A[index] * B[index])); \n"
"}                                          \n"
"}                                          \n";

What is the purpose of the inner loop in your original code? You don't appear to be using j at all, so you're just performing the same expression 100 times. — jprice
Yes I'm doing the same expression 100 times. I could use it somewhere but the objective is to understand if there is any way to convert c++ code with nested loops in OpenCL kernel. — Ghias
Sure there is - get rid of the loop you're not using. (How to optimise something depends on what it's supposed to do: just adding redundant code to have something to work on makes the question meaningless, because the only sensible optimisation is to get rid of it. Come up with a use for j and you'll get meaningful answers, which will vary depending on how you decide to use it.) — Leushenko
This question is too generic to be answered. Since "How to use nested loops?" can be answered like "So this: for(::){for(::){}}". That is easy, the difficult part is how to use them "efficiently", and that cannot be answered without a proper example. — DarkZeros

Rumesh Krishnan Rumesh Krishnan · Accepted Answer · 2014-06-11T10:22:01

When you want to use nested loop in OpenCL kernel , use the two dimension like this example as matrix multiplication .

__kernel void matrixMul(__global float* C, 
      __global float* A, 
      __global float* B, 
      int wA, int wB)
{
   int tx = get_global_id(0); 
   int ty = get_global_id(1);
   float value = 0;
   for (int k = 0; k < wA; ++k)
   {
     float elementA = A[ty * wA + k];
     float elementB = B[k * wB + tx];
     value += elementA * elementB;
   }
   C[ty * wA + tx] = value;
}

Did you need full explanation on here

Nested loops in OpenCl Kernel

1 Answers