CUDA NPP - unknown error upon GPU error check

Question

I am trying to sum all the pixels in an image, and get the average of all pixels using the CUDA NPP library. My image is an 8-bit unsigned char grayscale image of dimension w256 x h1024. I have tried to follow all the required rules of declaring pointers and passing the corresponding NPP-type pointers to the NPP functions.

However, I am getting an unknown error when I perform GPU error checking on my code. I tried to debug it but, I can't seem to figure out as to where I am going wrong, and I would like some help please?

I am using OpenCV in addition to this to do my processing, and hence some OpenCV code will be present.

EDIT: Code has been updated

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
    if (code != cudaSuccess) 
    {
        fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) getchar();
    }
}

// process image here 

// device_pointer initializations
unsigned char *device_input;
unsigned char *device_output;    

size_t d_ipimgSize = input.step * input.rows;
size_t d_opimgSize = output.step * output.rows;

gpuErrchk( cudaMalloc( (void**) &device_input, d_ipimgSize) );
gpuErrchk( cudaMalloc( (void**) &device_output, d_opimgSize) );

gpuErrchk( cudaMemcpy(device_input, input.data, d_ipimgSize, cudaMemcpyHostToDevice) );

// Median filter the input image here
// .......

// start summing all pixels 
Npp64s *partialSum = 0; 
partialSum = (Npp64s *) malloc(sizeof(Npp64s));

int bytes = input.cols*input.rows;

Npp8u *scratch = nppsMalloc_8u(bytes);

int ostep = input.step; 
NppiSize imSize; 
imSize.width = input.cols; 
imSize.height = input.rows;

// copy processed image data into a source_pointer
unsigned char *odata; 
odata = (unsigned char*) malloc( sizeof(unsigned char) * input.rows * input.cols);
memcpy(odata, output.data, sizeof(unsigned char) * input.rows * input.cols);

// compute the sum over all the pixels
nppiSum_8u64s_C1R( odata, ostep, imSize, scratch, partialSum );

// print sum 
printf( "\n Total Sum cuda %d \n",  *partialSum) ;

gpuErrchk(cudaFree(device_input));   // <--- Unknown error here
gpuErrchk(cudaFree(device_output));

where are device_input and device_output variables declared and allocated? Can you show that code? — Robert Crovella
@RobertCrovella I updated the code to show the declarations and definitions of device_input and device_output. While debugging, I had tried to change the declaration of bytes, ostep, imSize, and odata to use the openCV output structure (as in output.step, output.rows, output.cols), to see if I could get rid of the errors. But, that did not seem to work either. — Eagle

kunzmi kunzmi · Accepted Answer · 2014-03-21T00:33:06

The partialSum argument in nppiSum_8u64s_C1R should be device allocated memory.

Further you allocate scratch buffer of the size of your image. There's a function called nppiSumGetBufferHostSize_8u64s_C1R that gives you the exact size for the scratch buffer, which might be larger than the image itself (not very likely for a simple summation, but possible).

And always check return values in NPP as for Cuda, too. nppiSum_8u64s_C1R probably won't return NPP_NO_ERROR in your case.

CUDA NPP - unknown error upon GPU error check

1 Answers