First the error : Segmentation Fault (at the highlighted cudaMalloc line if I make a > approximately 61432)
unsigned int xarray[a];
unsigned int *dev_a;
int result[33*a];
int *dev_result;
**cudaMalloc((void **)&dev_a,a * sizeof(unsigned int));**
cudaMemcpy(dev_a,xarray,a*sizeof(int),cudaMemcpyHostToDevice);
kernel<<<a,66>>>(dev_a,dev_result);
cudaMemcpy(result,dev_result,33*a*sizeof(int),cudaMemcpyDeviceToHost);
The reason why I mentioned 'approximately' above is because sometimes it works when a = 61432 and sometimes it doesn't. I am not able to understand why.
Also, this is only for a launch of 1D grid of blocks. My intent is to launch a 3D grid [a*a*a].
result) that large on the stack. Usestd::vector<int>instead. - Jared Hoberock61432³grid? Where did you get a GPU with 8 petabytes memory?? - leftaroundabout