0
votes

I am trying to find the sum of an array (already present in CUDA memory) using thrust library. Few replies here, said that is possible by wrapping it using thrust::device_ptr, but it is throwing an error for me.

Initial code

cudaMemcpy((void *)(data + stride), (void *)d_output, sizeof(unsigned int) * rows * cols, cudaMemcpyDeviceToHost);
thrust::device_vector<unsigned int> vec((data + stride), (data + stride + (rows * cols)));
sum = thrust::reduce(vec.begin(), vec.end());

The above code works perfectly fine. But if I change it to

thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
thrust::device_ptr<unsigned int> outputPtrEnd((d_output + stride + (rows * cols)));
sum = thrust::reduce(outputPtrBegin, outputPtrEnd);

It throws me the following error.

 terminate called after throwing an instance of 'thrust::system::system_error'
 what():  an illegal memory access was encountered
 Aborted (core dumped)

What could be the problem? Thanks a lot for your time.

*Edited input from Robert Crovella The mistake was using stride. I have a following question (related to the above declaration)

Depending on the value of toggle, I need to call thrust

if(toggle) {
    thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
    thrust::device_ptr<unsigned int> outputPtrEnd((d_output + (rows * cols)));
}
else {
    thrust::device_ptr<unsigned int> outputPtrBegin(d_X);
    thrust::device_ptr<unsigned int> outputPtrEnd((d_X + (rows * cols)));
}

But the compilation says outputPtrBegin and outputPtrEnd are not declared, because they are in the if statement. How do I declare these device pointers before and then use?

1
What is stride? In the initial code it looks like a byte offset. Is it? - talonmies
Hi, Yes it's just an offset. Even when it is set to zero, I run into the same error. - Andrew Mathews
"Even when it is set to zero, I run into the same error." That's puzzling. If so, my answer is not explaining the root cause of the issue, (although the sizes of your operations are clearly different). It would be better if you provided a complete code. - Robert Crovella
Sorry, I was actually editing that comment. This code was in a while loop (incrementing values of stride in steps of (rows * cols)). The first iteration had run successfully (I had one more bug in printing values & it dint print the first printf when stride = 0, though the code had executed) Sorry once again. - Andrew Mathews

1 Answers

3
votes

This is wrong:

thrust::device_ptr<unsigned int> outputPtrEnd((d_output + stride + (rows * cols)));

It should be:

thrust::device_ptr<unsigned int> outputPtrEnd((d_output + (rows * cols)));

In your first (working) example, you are copying a region from the device to the host. On the device, that region starts at d_output and has a length of rows*cols elements. This is the data that you are ulimately passing through the reduce operation. Yes, on the host, it happens to be copied to a region that begins at data + stride but that is irrelevant. Ultimately you are performing a reduce over rows*cols elements, in your first implementation.

It's quite clear that in the second implementation, you are attempting to perform a reduce operation starting at d_output and going to d_output+stride+(rows*cols). This is not the same size operation.

In addition, you may want to do something like this instead:

thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
thrust::device_ptr<unsigned int> outputPtrEnd = outputPtrBegin + (rows * cols);
sum = thrust::reduce(outputPtrBegin, outputPtrEnd);

Regarding your second question (please post new questions as new questions), instead of this:

if(toggle) {
    thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
    thrust::device_ptr<unsigned int> outputPtrEnd((d_output + (rows * cols)));
}
else {
    thrust::device_ptr<unsigned int> outputPtrBegin(d_X);
    thrust::device_ptr<unsigned int> outputPtrEnd((d_X + (rows * cols)));
}

Do something like this:

thrust::device_ptr<unsigned int> outputPtrBegin;
thrust::device_ptr<unsigned int> outputPtrEnd;
if(toggle) outputPtrBegin=thrust::device_pointer_cast<unsigned int>(d_output);
else outputPtrBegin=thrust::device_pointer_cast<unsigned_int>(d_X);
outputPtrEnd = outputPtrBegin + (rows * cols);