1
votes

First the error : Segmentation Fault (at the highlighted cudaMalloc line if I make a > approximately 61432)

unsigned int xarray[a];
unsigned int *dev_a;
int result[33*a];
int *dev_result;

**cudaMalloc((void **)&dev_a,a * sizeof(unsigned int));**

cudaMemcpy(dev_a,xarray,a*sizeof(int),cudaMemcpyHostToDevice);

kernel<<<a,66>>>(dev_a,dev_result);

cudaMemcpy(result,dev_result,33*a*sizeof(int),cudaMemcpyDeviceToHost);

The reason why I mentioned 'approximately' above is because sometimes it works when a = 61432 and sometimes it doesn't. I am not able to understand why.

Also, this is only for a launch of 1D grid of blocks. My intent is to launch a 3D grid [a*a*a].

1
You can't expect the compiler to be able to statically allocate an array (result) that large on the stack. Use std::vector<int> instead. - Jared Hoberock
You want to allocate a 61432³ grid? Where did you get a GPU with 8 petabytes memory?? - leftaroundabout
I used malloc instead and it seems to solved the problem. Thanks ! - dparkar
leftaroundabout - can I not use dim3 blocks(65535,65535,65535) and launch kernel<<<blocks,66>>>(dev_abc,dev_result) !?!?!??! Where dev_abc (65535 x 65535 x 65535) ?!?!?! I want to max the GPU - dparkar
@leftaroundabout I think I understood what you meant, there is not enough memory on the GPU for utilizing full 65535x65535x65535 dimensions, for my problem at hand I can go upto say 220x220x200. - dparkar

1 Answers

1
votes

You code is segfaulting because the array result is too large. In practice, you can't really expect to statically allocate an array of size 33 * 61432 on the stack.

Instead, use a std::vector to dynamically allocate the array and pass a pointer to the pointer's data to cudaMemcpy:

#include <vector>
...
std::vector<int> result(33 * 1);
...
cudaMemcpy(&result[0], dev_result, 33 * a * sizeof(int), cudaMemcpyDeviceToHost);