Suppose a struct X with some primitives and an array of Y structs:
typedef struct
{
int a;
Y** y;
} X;
An instance X1 of X is initialized at the host, and then copied to an instance X2 of X, on the device memory, through cudaMemcpy.
This works fine for all the primitives in X (such as int a), but cudaMemcpy seems to flatten any double pointer into a single pointer, thus causing out of bounds exceptions wherever there's an access to the struct arrays in X (such as y).
In this case am I supposed to use another memcpy function, such as cudaMemcpy2D or cudaMemcpyArrayToArray?
Suggestions are much appreciated. Thanks!
edit
The natural approach (as in "that's what I'd do if it were just C) towards copying an array of structures would be to cudaMalloc the array and then cudaMalloc and initialize each element separately, e.g.:
X** h_x;
X** d_x;
int num_x;
cudaMalloc((void**)&d_x, sizeof(X)*num_x);
int i=0;
for(;i<num_x;i++)
{
cudaMalloc((void**)d_x[i], sizeof(X));
cudaMemcpy(&d_x[i], &h_x[i], sizeof(X), cudaMemcpyHostToDevice);
}
However, the for's cudaMalloc generates a crash. I confess I'm not yet comfortable with the usage of pointers in Cuda functions, so perhaps I screwed up with the cudaMalloc and cudaMemcpy parameters?
**
) makes this extra challenging. If you want to see how to copy**
arrays from host to device, look here. It's not for the faint of heart. Note that a.lasram is suggesting flattening on the host first. I also suggest you accept the answer given by a.lasram, and post new questions if you have them. It makes the question messy and confusing for others to read when you make wholesale edits and post mostly new questions in your old one that's already been answered. – Robert Crovella