libavcodec initialization to achieve real time playback with frame dropping when necessary

Question

I have a C++ computer vision application linking with the ffmpeg libraries that provides frames from video streams to analysis routines. The idea being one can provide a moderately generic video stream identifier, and that video source will be decompressed and passed frame after frame to an analysis routine (which runs the user's analysis functions.) The "moderately generic video identifier" covers 3 generic video stream types: paths to video files on disk, IP video streams (cameras or video streaming services), and USB webcam pins with desired format & rate.

My current video player is generic as possible: video only, ignoring audio and other streams. It has a switch case for retrieving a stream's frame rate based upon the stream's source and codec, which is used to estimate the delay between decompressing frames. I've had many issues with trying to get reliable timestamps from the streams, so I am currently ignoring pts and dts. I know ignoring pts/dts is bad for variable frame rate streams. I plan to special case them later. The player currently checks to see if the last decompressed frame is more than 2 frames late (assuming a constant frame rate), and if so "drops the frame" - does not pass it to the user's analysis routine.

Essentially, the video player's logic is determining when to skip frames (not pass them to the time consuming analysis routine) so the analysis is fed video frames in as close as possible to real time.

I am looking for examples or discussions how one can initialize and/or maintain their AVFormatContext, AVStream, and AVCodecContext using (presumably but not limited to) AVDictionary options such that frame dropping as is necessary to maintain real time is performed at the libav libraries level, and not at my video player level. If achieving this requires separate AVDictionaies (or more) for each stream type and codec, then so be it. I am interested in understanding the pros and cons of both approachs: dropping frames at the player level or at the libav level.

(When some analysis requires every frame, the existing player implementation with frame dropping disabled is fine. I suspect if I can get frame dropping to occur at the libav level, I'll save the packet to frame decompression time as well, reducing the processing more than my current version.)

to the best of my knowledge, libav does not have this feature. You will need to handle this case yourself. — szatmary
I'm experimenting with first detecting if the codec delivers only full frames, such as the mjpeg codec common with USB cameras. If true, by using a very small rtbuffsize of 768K, I get frame dropping such that the maximum real time latency is under 2 seconds max, often under 1 second. I know frame dropping is occurring beneath my logic because the av_log callback is getting messages announcing each frame drop because the rtbuffer is too full. — Blake Senftner
I don’t think the frame dropping isn’t happening in ffmpeg in that case. It’s putting network back pressure on the camera, and the camera is dropping the frame before it hits the network. — szatmary
Is frame dropping in the USB camera (due to network back pressure) necessarily bad? — Blake Senftner

Alex Cohn Alex Cohn · Accepted Answer · 2019-10-20T08:33:31

if I can get frame dropping to occur at the libav level, I'll save the packet to frame decompression time as well

No you won't, unless you are willing to drop all frames til the next key frame. On typical mp4 video, this could easily be few seconds.

You can skip colorspace conversion and resize, but often these are taken care of by the player.

libavcodec initialization to achieve real time playback with frame dropping when necessary

1 Answers