2
votes

So I've got a problem that has got me stuck for a little while now. I'm using NSight Eclipse Edition (CUDA 7.0) for programming on a GT 630 (Kepler version) GPU.

Basically, I have an array of a class (Static_Box), and I modify the data on the host (CPU). I then want to send the data over to the GPU to do computation, however, my code is not doing that. Here's some of my code:

#define SIZE_OF_BOX_ARRAY 3

class Edge {
    int x1, y1, x2, y2;
}

class Static_Box {
    Static_Box(int x, int y, int width, int height);
    Edge e1, e2, e3, e4;
}

Static_Box::Static_Box(int x, int y, int width, int height) {
    e1.x1 = x;
    e1.y1 = y;
    e1.x2 = x+width;
    e1.y2 = y;
    // e2.x1 = x+width;  Continuing in this manner (no other calculations)
}

// Storage of the scene. d_* indicates GPU memory
// Static_Box is a class I have defined in another file, it contains a
// few other classes that I wrote as well.
Static_Box *static_boxes;
Static_Box *d_static_boxes;

int main(int argc, char **argv) {
    // Create the host data storage
    static_boxes = (Static_Box*)malloc(SIZE_OF_BOX_ARRAY*sizeof(Static_Box));

    // I then set a few of the indexes of static_boxes here, which is
    // the data I need written while on the CPU.
    // Example:
    static_boxes[0] = Static_Box(

    // Allocate the memory on the GPU
    // CUDA_CHECK_RETURN is from NVIDIA's bit reverse example (exits the application if the GPU fails)
    CUDA_CHECK_RETURN(cudaMalloc((void**)&d_static_boxes, SIZE_OF_BOX_ARRAY * sizeof(Static_Box)));

    int j = 0;
    for (; j < SIZE_OF_BOX_ARRAY; j++) {
    //  Removed this do per Mai Longdong's suggestion
    //    CUDA_CHECK_RETURN(cudaMalloc((void**)&(static_boxes[j]), sizeof(Static_Box)));
        CUDA_CHECK_RETURN(cudaMemcpy(&(d_static_boxes[j]), &(static_boxes[j]), sizeof(Static_Box), cudaMemcpyHostToDevice));
    }
}

I've hunted around on here for quite a while, and found some helpful information from Robert Crovella, and progressed a little bit using his tips, but the answers he gave did not quite pertain to my problem. Does anybody have a solution to keep the host data intact while transferring to the GPU?

Thanks very much for your help!

Edit, included change on first cudaMalloc from MaiLongdong

Edit 2, included second change from Mai Longdong, and provided complete example.

1
Don't use malloc in C++. Use new if you really require dynamic allocation, but in this example you don't, use std::array. Also your cudaMalloc allocates sizeof(static_boxes) bytes which is the size of a pointer, which is not what you want. And lastly the second cudaMalloc stores its result in static_boxes, not d_static_boxes.user703016
Okay, getting there. Thanks for pointing out that sizeof(static_boxes) I've swapped it over to SIZE_OF_BOX_ARRAY * sizeof(Static_Box) I just tried changing the second cudaMalloc to use d_static_boxes but it is giving me a SIGBUS:Bus error. I'm going to work on copying the data back from the GPU now, and see how that goes. Thanks for your input @MaiLongdong!Gliderman
That's a thinko, you can't cudaMalloc into a device pointer, I don't even know why I said that, it's not even monday morning. Drop the second cudaMalloc altogether. Also, maybe you should get a book on C++ because you seem quite confused with basic semantics.user703016
Unless Static_Box contains pointers (which definition you haven't shown) you are done after the first cudaMalloc. Writing a question where the extent of your actual problem description is "I'm having trouble doing that" is quite unclear, especially when coupled with the fact that you haven't provided an MCVE, which SO expects for questions like this. (I've voted to close this question for lack of MCVE.) If Static_Box does contain pointers, then the code is quite a bit more complicated. Try thisRobert Crovella
Putting "Solved" in the question title is not appropriate on SO. Instead, upvote or mark one of the answers as accepted, or else provide your own answer and accept that. That is the SO way to mark a question "Solved". By the way I removed my close vote as you've now provided something that approximates an MCVE (although it still has uncompilable junk in it.)Robert Crovella

1 Answers

1
votes

If Static_Box contains no pointers (member data referred to by pointers that would require independent allocations), then copying an array of them is not really any different than copying an array of POD types, like int. This should be all you need:

#define SIZE_OF_BOX_ARRAY 3

Static_Box *static_boxes;
Static_Box *d_static_boxes;

int main(int argc, char **argv) {

    static_boxes = (Static_Box*)malloc(SIZE_OF_BOX_ARRAY*sizeof(Static_Box));
    CUDA_CHECK_RETURN(cudaMalloc((void**)&d_static_boxes, SIZE_OF_BOX_ARRAY * sizeof(Static_Box)));
    CUDA_CHECK_RETURN(cudaMemcpy(d_static_boxes, static_boxes, SIZE_OF_BOX_ARRAY*sizeof(Static_Box), cudaMemcpyHostToDevice));

If you think that is not working, you'll need to give a specific example of what you are doing and what exactly led you to believe that it is not working (data not matching, CUDA runtime error thrown, etc.) The example you provide should be complete, so that someone else can compile it, run it, and see whatever problem it is that you are reporting. If the code you post in your question doesn't compile, it's not an MCVE (my opinion, which influences my voting pattern.)