I'm a beginner of OpenCL for image processing, I use Win7+VS2010+OpenCL2.0+OpenCV247. The platform in my PC is intel i7 CPU + NvidIA GTX760.
Here is my work:
I used opencv to read image(1920*1080) from video, then copy image data and get the data pointer.
uchar* input_data=(uchar*)(gray_image->imageData);
Then I want do some convolution and other image processing works on GPU, so I used OpenCL to upload this data(input_data) to the device memory(cl_input_data) which has been created before. The uploading step takes about 0.2ms, it is fast.
clEnqueueWriteBuffer(queue, cl_input_data, 1, 0, ROI_size*sizeof(cl_uchar), (void*)input_data, 0, 0, NULL);
The main processing works on several kernels, and each of them takes less than 0.1ms which are all quite normal.
clEnqueueNDRangeKernel( queue,kernel_box,2,NULL,global_work_size,local_work_size, 0,NULL, NULL);
After all the processing, I want to download the GPU memory(cl_output_data) to host(output_data), and this step it takes over 5.5ms! Which is nearly 27 times slower than the data uploading step!
clEnqueueReadBuffer( queue,cl_output_data,CL_TRUE,0,ROI_size * sizeof(char),(void*) output_data,0, NULL, NULL );
So, I'm just wondering, since I used the same device and the data size was exactly the same, why the uploading and downloading data's time is so different?
Oh, by the way, the time testing tool I used is something like QueryPerformanceFrequency(&m_Frequency);
Thank you!