everyone.
I got this kernel:
__kernel void FuncionCL(__global char* in, __global char* out, __global int* S2)
{
__private int op1, op2, op3;
__private int C;
__private uint WorkDim, C2;
op1 = 1;
op2 = 2;
WorkDim = get_global_size(0);
__private int ID;
ID = get_global_id(0);
for(C = 0; C < 1000000; C++)
{
for(C2 = ID; C2 < 1000; C2 += WorkDim)
{
op3 = op1 + op2;
}
}
out[0] = 90;
out[1] = 89;
*S2 = (int) WorkDim;
}
It crashes not only the application, the graphic controller too. I i change the for increment for the constant value '16' (the get_global_size() function returns) then the code runs fine. What's the problem?
If i run the code with:
WorkDim = 16;
in the line 8 instead of:
WorkDim = get_global_size(0);
The code runs 400 times faster, that's the problem. Why if the value is the same?
**EDIT: ** Well, now i know why, the code is so slow and there are multiply reasons:
1.- The occupancy.
2.- All the threads do the same iterations in the first loop, the right code looks like this:
__kernel void FuncionCL(__global char* in, __global char* out, __global int* S2) { __private int op1, op2, op3; __private int C; __private uint WorkDim, C2; op1 = 1; op2 = 2; WorkDim = get_global_size(0); __private int ID; ID = get_global_id(0); for(C = ID; C < 1000000; C += WorkDim) { for(C2 = ID; C2 < 1000; C2 += WorkDim) { op3 = op1 + op2; } } out[0] = 90; out[1] = 89; *S2 = (int) WorkDim; }
Now my code runs 6.1 times faster on the GPU than CPU.