Advice for real time image processing

Question

really need some help and advice as I'm new with real time image processing.

I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm using CUDA and return the data back to CPU. but I am not sure what would be the best option for realtime processing. my system : CPU Intel Xeon x5660 2.8Ghz(2 processors) GPU NVIDIA Quadro 5000 the problem is that I want to make sure while I am getting 1000fps and and pass them to GPU for processing, how can I make sure that I am not loosing any data for the next second comming from the grabber? Do I need to implement multi threading in c++? and parralel programming in OpenCV/OpenCL/CUDA? please if you have any idea or recommendation, let me know. I really need some expert in real time image processing advice. Thank you

Ambitious project - I suggest you get it working at a nominal data rate first using just a normal CPU and then benchmark its performance - that will tell you how much further you may need to go with optimisation, GPGPU, etc, and will also serve as a useful baseline implementation. — Paul R
I agree with @PaulR, you will need to implement a reference version, measure how time that takes and optimize from there. It is impossible for anyone to predict a best course from the information provided. — Ani
This is a pretty unanswerable question as asked - the only people who can answer this are those who have already tried to build a 1kHz frame grabber and image processing system using a GPU. I suspect that is a vanishingly small number of people.... But having said that I very much doubt a GPU is what you want for this. GPUs can achieve very high data throughput on computationally intensive workloads, but they have a lot of latency and the PCI-e bandwidth is a very constraining factor in overall throughput. — talonmies
@Paul & "anathonline": I have implemented my algorithm in CPU already with 100fps and images 100x100. but I got what you mean thank you — user261002

karlphillip karlphillip · Accepted Answer · 2012-05-10T14:27:31

As you know, OpenCV implements several of its features in the GPU as well using the CUDA framework.

You can write your own CUDA code/functions to operate on the data and convert it to the OpenCV format without any problems. I demonstrate how to do this on cuda-grayscale. I guess this example answers most of your questions.

Note that OpenCV 2.3.1 uses CUDA 4.0, and OpenCV 2.4 only works with CUDA 4.1.

Regarding this statement:

I want to make sure while I am getting 1000fps and and pass them to GPU for processing

It's most likely that you won't be able to process the frames as fast as they come from the camera. If you don't want to drop any frames, you can forget about real-time (I'm assuming you are not working with incredibly small images (10x15)).

If you really need to work with 1000 FPS you'll have to implement a buffering mechanism to store the frames that comes from the device. And this is where we start talking about the implementation of a multithreaded system: the main thread of your application will be responsible to grab the frames from the camera and store them in a buffer, and the 2nd thread will read from the buffer and perform the processing on the frames.

For information on how to implement the buffering mechanism, check:

How to implement a circular buffer of cv::Mat objects (OpenCV)?

Thread safe implementation of circular buffer

C + OpenCV: IplImage with circular buffer

Advice for real time image processing

1 Answers