I've launched a kernel with 2100 blocks and 4 threads per block.
Somewhat in this kernel all the threads have to execute a function, and put its result on an array (on global memory) into "threadIdx.x" position.
I surely know that, in this fase of the project, the function always returns 1.012086. Now, I've written this code to do that sum:
currentErrors[threadIdx.x]=0;
for(i=0;i<gridDim.x;i++)
{
if(i==blockIdx.x)
{
currentErrors[threadIdx.x]+=globalError(mynet,myoutput);
}
}
But when the kernel ends all array's position has 1.012086 as value (instead 1.012086*2100).
Where I'm wrong? Thanks for your helps!