5
votes

I am implementing a (very) low latency video streaming C++ application using ffmpeg. The client receives a video which is encoded with x264’s zerolatency preset, so there is no need for buffering. As described here, if you use av_read_frame() to read packets of the encoded video stream, you will always have at least one frame delay because of internal buffering done in ffmpeg. So when I call av_read_frame() after frame n+1 has been sent to the client, the function will return frame n.

Getting rid of this buffering by setting the AVFormatContext flags AVFMT_FLAG_NOPARSE | AVFMT_FLAG_NOFILLIN as suggested in the source disables packet parsing and therefore breaks decoding, as noted in the source.

Therefore, I am writing my own packet receiver and parser. First, here are the relevant steps of the working solution (including one frame delay) using av_read_frame():

AVFormatContext *fctx;
AVCodecContext *cctx;
AVPacket *pkt;
AVFrame *frm;

//Initialization of AV structures
//…

//Main Loop
while(true){

    //Receive packet
    av_read_frame(fctx, pkt);

    //Decode:
    avcodec_send_packet(cctx, pkt);
    avcodec_receive_frame(cctx, frm);

    //Display frame
    //…
}

And below is my solution, which mimics the behavior of av_read_frame(), as far as I could reproduce it. I was able to track the source code of av_read_frame() down to ff_read_packet(),but I cannot find the source of AVInputformat.read_packet().

int tcpsocket;
AVCodecContext *cctx;
AVPacket *pkt;
AVFrame *frm;
uint8_t recvbuf[(int)10e5];
memset(recvbuf,0,10e5);
int pos = 0;

AVCodecParserContext * parser = av_parser_init(AV_CODEC_ID_H264);
parser->flags |= PARSER_FLAG_COMPLETE_FRAMES;
parser->flags |= PARSER_FLAG_USE_CODEC_TS;

//Initialization of AV structures and the tcpsocket
//…

//Main Loop
while(true){

    //Receive packet
    int length = read(tcpsocket, recvbuf, 10e5);
    if (length >= 0) {

        //Creating temporary packet
        AVPacket * tempPacket = new AVPacket;
        av_init_packet(tempPacket);
        av_new_packet(tempPacket, length);
        memcpy(tempPacket->data, recvbuf, length);
        tempPacket->pos = pos;
        pos += length;
        memset(recvbuf,0,length);

        //Parsing temporary packet into pkt
        av_init_packet(pkt);
        av_parser_parse2(parser, cctx,
            &(pkt->data), &(pkt->size),
            tempPacket->data, tempPacket->size,
            tempPacket->pts, tempPacket->dts, tempPacket->pos
            );

        pkt->pts = parser->pts;
        pkt->dts = parser->dts;
        pkt->pos = parser->pos;

        //Set keyframe flag
        if (parser->key_frame == 1 ||
            (parser->key_frame == -1 &&
            parser->pict_type == AV_PICTURE_TYPE_I))
            pkt->flags |= AV_PKT_FLAG_KEY;
        if (parser->key_frame == -1 && parser->pict_type == AV_PICTURE_TYPE_NONE && (pkt->flags & AV_PKT_FLAG_KEY))
            pkt->flags |= AV_PKT_FLAG_KEY;
        pkt->duration = 96000; //Same result as in av_read_frame()

        //Decode:
        avcodec_send_packet(cctx, pkt);
        avcodec_receive_frame(cctx, frm);
        //Display frame
        //…
    }
}

I checked the fields of the resulting packet (pkt) just before avcodec_send_packet() in both solutions. They are as far as I can tell identical. The only difference might be the actual content of pkt->data. My solution decodes I-Frames fine, but the references in P-Frames seem to be broken, causing heavy artifacts and error messages such as “invalid level prefix”, “error while decoding MB xx”, and similar.

I would be very grateful for any hints.

Edit 1: I have developed a workaround for the time being: in the video server, after sending the packet containing the encoded data of a frame, I send one dummy packet which only contains the delimiters marking beginning and end of the packet. This way, I push the actual video data frames through av_read_frame(). I discard the dummy packets immediately after av_frame_read().

Edit 2: Solved here by rom1v, as written in his comment to this question.

1
As for AVInputformat.read_packet() this function pointer will be set for corresponding codec's read_packet() function, easiest way to confirm is compiling FFmpeg with debug info (easiest way to me just before 'make' edit ffbuild/config.mak file, find 'STRIP=strip' and replace it as 'STRIP=echo', than make and make install.). Later run your code with breakpoint set the function you want, after breakpoint hit, use bt (backtrace) on gdb. And also I notice you used av_init_packet and av_new_packet together, but av_new_packet also calls av_init_packet so you don't need it. - the kamilz
I encouter the very same problem, av_read_frame() introduce 1 frame latency: it calls (and blocks on) my custom avio_ctx->read_packet to receive more data while it has not consumed the previous (complete) packet. What data do you write exactly for your "dummy packet"? - rom1v
Assume that I encoded an image with x264_encoder_encode(enc_ctx, &nals, &num_nals, &pic_in, &pic_out);. The compressed data is in nals. Then, after sending the full frame (e.g. for(int i=0;i<num_nals;++i){ write(newfd, nals[i].p_payload, nals[i].i_payload); }) I write 50 Bytes of the beginning and end of the frame: write(sock, nals[0].p_payload, 50); write(sock, nals[num_nals - 1].p_payload + nals[num_nals -1].i_payload - 50, 50); - Chris_128
Thank you for your answer. In your original post, the problem is that if you set PARSER_FLAG_COMPLETE_FRAMES, then you are responsible to only pass complete frames to av_parser_parse2 (and if you unset the flag, you will still get 1 frame latency). I finally managed to parse manually and reduce the latency by 1 frame: github.com/Genymobile/scrcpy/pull/646 - rom1v
Good point. For me, it worked as described in my previous answer. Thank you for sharing your solution, excellent work! - Chris_128

1 Answers

1
votes

av_parser_parse2() does not neccessarily consume your tempPacket in one go. You have to call it in another loop and check its return value, like in the API docs.