How to optimize frame grabbing from video stream in OpenCV?

Question

I ran into a problem problem of low frame capture efficiency in OpenCV.

Hardware & Software.
- Raspberry Pi 3 (1,2 GHz quad-core ARM) with HDMI Display
- IP camera: LAN connected, RTSP, H264 codec, 1280x720 resolution, 20 fps, 1 GOP, 2500 kB/s VBR bitrate (parameters can be changed).
- OS Raspbian Stretch
- Python 3.5
- OpenCV 4.1
- Gstreamer 1.0
Task.

Get videostream from IP camera, recognize images and display resulting video (with marks and messages).

Important features: real-time processing, HD resolution (1280x720), high frame rate (>20 fps), continuous operation for several hours.

My solution.

General algorithm: source video stream -> decoding and frame grabbing -> work with frames in OpenCV -> assembling the processed frames into a video stream -> display video using a Raspberry Pi GPU

OpenCV output/display method - imshow - does not work well even at low-resolution video. The only library that allows to use a Raspberry Pi GPU to decode and display video is a Gstreamer.

I compiled Gstreamer modules (gstreamer1.0-plugins-bad, gstreamer1.0-omx) with OMX support and tested it:

gst-launch-1.0 rtspsrc location='rtsp://web_camera_ip' latency=400 ! queue ! rtph264depay ! h264parse ! omxh264dec ! glimagesink

It works great, CPU usage is about 9%.

Next I compiled OpenCV with Gstreamer, NEON, VFPV3 support.

I use the following code for testing:

import cv2
import numpy as np

src='rtsp://web_camera_ip'
stream_in = cv2.VideoCapture(src)

pipeline_out = "appsrc ! videoconvert ! video/x-raw, framerate=20/1, format=RGBA ! glimagesink sync=false"
fourcc = cv2.VideoWriter_fourcc(*'H264')

stream_out = cv2.VideoWriter(pipeline_out, cv2.CAP_GSTREAMER, fourcc, 20.0, (1280,720))
while True:
    ret, frame = stream_out.read()
    if ret:
      stream_out.write(frame)
      cv2.waitKey(1)

It also worked, but not so well as Gstreamer itself. CPU usage is about 50%, without stream_out.write(frame) - 35%. At frame rate above 15, there are lags and delays.

How I tried to solve the problem.

4.1. Use Gstreamer to decode video stream:

pipline_in='rtspsrc location=rtsp://web_camera_ip latency=400 ! queue ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! appsink'
stream_in = cv2.VideoCapture(pipline_in)

It even worsened the situation - the CPU load increased by several percent, the delay has become more.

4.2. I also tried to optimize the library using method from PyImageSearch.com - threading using WebcamVideoStream from imutils library.

from threading import Thread
import cv2
import numpy as np
import imutils

src='rtsp://web_camera_ip'
stream_in = WebcamVideoStream(src).start()
pipeline_out = "appsrc ! videoconvert ! video/x-raw, framerate=20/1, format=RGBA ! glimagesink sync=false"
fourcc = cv2.VideoWriter_fourcc(*'H264')

stream_out = cv2.VideoWriter(pipeline_out, cv2.CAP_GSTREAMER, fourcc, 20.0, (1280,720))
while True:
    frame = stream_in.read()
    out.write(frame)
    cv2.waitKey(1)

CPU usage has increased to 70%, the quality of the output video stream has not changed.

4.3 Сhanging the following parameters does not help: whaitKey(1-50), videostream bitrate (1000-5000 kB/s), videostream GOP (1-20).

Questions.

As I understand, VideoCaputre/Videowritter methods has a very low efficiency. Maybe it's not noticeable on PC, but it is critical for Raspberry Pi 3.

Is it possible to increase the performance of the VideoCaputre (Videowritter)?
Is there an alternative way to capture frames from video to OpenCV?

Thanks in advance for answers!

UPDATE 1

I think I know what the problem is, but I don't know how to solve it.

Refinement about CPU usage when working with VideoCapture and VideoCapture+Gstreamer. VideoCapture(src)+VideoWriter(gstreamer_piplene_out) - 50-60%, VideoCapture(gstreamer_pipline_in) +VideoWriter(gstreamer_piplene_out) - 40-50%.
Сolor formats that different parts of my program work with. H264 video stream - YUV, OpenCV - BGR, OMX layer output - RGBA. OpenCV can work only with frames in BGR color format. OMX layer output when trying to launch the collected video in a different color format, displays a black screen.
Сolor format conversion in the Gstremaer pipline is carried out using the videoconvert. In some cases the method can work automatically (without specifying parameters), it is also possible to specify the color format forcibly. And I do not know how it works in the "pure" VideoCapture(src).

The main problem is that videoconvert does not support GPU - the main CPU load is due to the color format conversion!

I tested this assumption using the "pure" Gstreamer, adding the videoconvert:

gst-launch-1.0 rtspsrc location='web_camera_ip' latency=400 ! queue ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! video/x-raw, format=BGR ! glimagesink sync=false

Black display, CPU load is 25%.

Check this pipline:

gst-launch-1.0 rtspsrc location='web_camera_ip' latency=400 ! queue ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! video/x-raw, format=RGBA ! glimagesink sync=false

Video is displayed, CPU load is 5%. I also assume that the omxh264dec converts the color format YUV to RGBA using GPU (after omxh264dec, videoconver does not load the CPU).

I don't know how to use GPU for color format conversion in VideoCapture/Gstreamer on Raspberry.

In this thread 6by9, Rapberry engineer and graphics programming specialist, writes that "The IL video_encode component supports OMX_COLOR_Format24bitBGR888 which I seem to recall maps to OpenCV's RGB".

Are there any ideas?

You should be aware in which memory spaces you are working. Your first pipeline is the most efficient one: You decode video on the graphics card and tell the graphics card to display it. If you want to do CPU processing between decoding and display you need to download to the image data to the CPU host memory, do the processing and upload it back to the graphics card memory. This is expensive, especially for embedded devices like the Pi. Ideally you can do the processing on the OpenGL texture directly on the GPU and avoid copying the image data around. — Florian Zwoch

alexb alexb · Accepted Answer · 2019-07-08T22:09:14

Do you really need to recognize every image that you've captured? You can use first pipeline for display image (you can use video overlay for watermarks and another artifacts), but decode for example every 6th image for CPU recognition. In this case, you'll use just GPU for capture and display video without CPU loading, and CPU for selectively image recognition

How to optimize frame grabbing from video stream in OpenCV?

1 Answers