So I've got a problem that has got me stuck for a little while now. I'm using NSight Eclipse Edition (CUDA 7.0) for programming on a GT 630 (Kepler version) GPU.
Basically, I have an array of a class (Static_Box), and I modify the data on the host (CPU). I then want to send the data over to the GPU to do computation, however, my code is not doing that. Here's some of my code:
#define SIZE_OF_BOX_ARRAY 3
class Edge {
int x1, y1, x2, y2;
}
class Static_Box {
Static_Box(int x, int y, int width, int height);
Edge e1, e2, e3, e4;
}
Static_Box::Static_Box(int x, int y, int width, int height) {
e1.x1 = x;
e1.y1 = y;
e1.x2 = x+width;
e1.y2 = y;
// e2.x1 = x+width; Continuing in this manner (no other calculations)
}
// Storage of the scene. d_* indicates GPU memory
// Static_Box is a class I have defined in another file, it contains a
// few other classes that I wrote as well.
Static_Box *static_boxes;
Static_Box *d_static_boxes;
int main(int argc, char **argv) {
// Create the host data storage
static_boxes = (Static_Box*)malloc(SIZE_OF_BOX_ARRAY*sizeof(Static_Box));
// I then set a few of the indexes of static_boxes here, which is
// the data I need written while on the CPU.
// Example:
static_boxes[0] = Static_Box(
// Allocate the memory on the GPU
// CUDA_CHECK_RETURN is from NVIDIA's bit reverse example (exits the application if the GPU fails)
CUDA_CHECK_RETURN(cudaMalloc((void**)&d_static_boxes, SIZE_OF_BOX_ARRAY * sizeof(Static_Box)));
int j = 0;
for (; j < SIZE_OF_BOX_ARRAY; j++) {
// Removed this do per Mai Longdong's suggestion
// CUDA_CHECK_RETURN(cudaMalloc((void**)&(static_boxes[j]), sizeof(Static_Box)));
CUDA_CHECK_RETURN(cudaMemcpy(&(d_static_boxes[j]), &(static_boxes[j]), sizeof(Static_Box), cudaMemcpyHostToDevice));
}
}
I've hunted around on here for quite a while, and found some helpful information from Robert Crovella, and progressed a little bit using his tips, but the answers he gave did not quite pertain to my problem. Does anybody have a solution to keep the host data intact while transferring to the GPU?
Thanks very much for your help!
Edit, included change on first cudaMalloc from MaiLongdong
Edit 2, included second change from Mai Longdong, and provided complete example.
malloc
in C++. Usenew
if you really require dynamic allocation, but in this example you don't, usestd::array
. Also yourcudaMalloc
allocatessizeof(static_boxes)
bytes which is the size of a pointer, which is not what you want. And lastly the secondcudaMalloc
stores its result instatic_boxes
, notd_static_boxes
. – user703016sizeof(static_boxes)
I've swapped it over toSIZE_OF_BOX_ARRAY * sizeof(Static_Box)
I just tried changing the secondcudaMalloc
to used_static_boxes
but it is giving me a SIGBUS:Bus error. I'm going to work on copying the data back from the GPU now, and see how that goes. Thanks for your input @MaiLongdong! – GlidermancudaMalloc
into a device pointer, I don't even know why I said that, it's not even monday morning. Drop the secondcudaMalloc
altogether. Also, maybe you should get a book on C++ because you seem quite confused with basic semantics. – user703016Static_Box
contains pointers (which definition you haven't shown) you are done after the firstcudaMalloc
. Writing a question where the extent of your actual problem description is "I'm having trouble doing that" is quite unclear, especially when coupled with the fact that you haven't provided an MCVE, which SO expects for questions like this. (I've voted to close this question for lack of MCVE.) IfStatic_Box
does contain pointers, then the code is quite a bit more complicated. Try this – Robert Crovella