What to pass to avcodec_decode_video2 for H.264 Transport Stream?

Question

I want to decode H.264 video from a collection of MPEG-2 Transport Stream packets but I am not clear what to pass to avcodec_decode_video2

The documentation says to pass "the input AVPacket containing the input buffer."

But what should be in the input buffer?

A PES packet will be spread across the payload portion of several TS packets, with NALU(s) inside the PES. So pass a TS fragment? The entire PES? PES payload only?

This Sample Code mentions:

BUT some other codecs (msmpeg4, mpeg4) are inherently frame based, so you must call them with all the data for one frame exactly. You must also initialize 'width' and 'height' before initializing them.

But I can find no info on what "all the data" means...

Passing a fragment of a TS packet payload is not working:

AVPacket avDecPkt;
av_init_packet(&avDecPkt);
avDecPkt.data = inbuf_ptr;
avDecPkt.size = esBufSize;

len = avcodec_decode_video2(mpDecoderContext, mpFrameDec, &got_picture, &avDecPkt);
if (len < 0)
{
    printf("  TS PKT #%.0f. Error decoding frame #%04d [rc=%d '%s']\n",
        tsPacket.pktNum, mDecodedFrameNum, len, av_make_error_string(errMsg, 128, len));
    return;
}

output

[h264 @ 0x81cd2a0] no frame!
TS PKT #2973. Error decoding frame #0001 [rc=-1094995529 'Invalid data found when processing input']

EDIT

Using the excellent hits from WLGfx, I made this simple program to try decoding TS packets. As input, I prepared a file containing only TS packets from the Video PID.

It feels close but I don't know how to set up the FormatContext. The code below segfaults at av_read_frame() (and internally at ret = s->iformat->read_packet(s, pkt)). s->iformat is zero.

Suggestions?

EDIT II - Sorry, for got post source code ** **EDIT III - Sample code updated to simulate reading TS PKT Queue

/*
 * Test program for video decoder
 */

#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

extern "C" {

#ifdef __cplusplus
    #define __STDC_CONSTANT_MACROS
    #ifdef _STDINT_H
        #undef _STDINT_H
    #endif
    #include <stdint.h>
#endif
}

extern "C" {
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
#include "libavutil/opt.h"
}


class VideoDecoder
{
public:
    VideoDecoder();
    bool rcvTsPacket(AVPacket &inTsPacket);

private:
    AVCodec         *mpDecoder;
    AVCodecContext  *mpDecoderContext;
    AVFrame         *mpDecodedFrame;
    AVFormatContext *mpFmtContext;

};

VideoDecoder::VideoDecoder()
{
    av_register_all();

    // FORMAT CONTEXT SETUP
    mpFmtContext = avformat_alloc_context();
    mpFmtContext->flags = AVFMT_NOFILE;
    // ????? WHAT ELSE ???? //

    // DECODER SETUP
    mpDecoder = avcodec_find_decoder(AV_CODEC_ID_H264);
    if (!mpDecoder)
    {
        printf("Could not load decoder\n");
        exit(11);
    }

    mpDecoderContext = avcodec_alloc_context3(NULL);
    if (avcodec_open2(mpDecoderContext, mpDecoder, NULL) < 0)
    {
        printf("Cannot open decoder context\n");
        exit(1);
    }

    mpDecodedFrame = av_frame_alloc();
}

bool
VideoDecoder::rcvTsPacket(AVPacket &inTsPkt)
{
    bool ret = true;

    if ((av_read_frame(mpFmtContext, &inTsPkt)) < 0)
    {
        printf("Error in av_read_frame()\n");
        ret = false;
    }
    else
    {
        // success.  Decode the TS packet
        int got;
        int len = avcodec_decode_video2(mpDecoderContext, mpDecodedFrame, &got, &inTsPkt);
        if (len < 0)
            ret = false;

        if (got)
            printf("GOT A DECODED FRAME\n");
    }

    return ret;
}

int
main(int argc, char **argv)
{
    if (argc != 2)
    {
        printf("Usage: %s tsInFile\n", argv[0]);
        exit(1);
    }

    FILE *tsInFile = fopen(argv[1], "r");
    if (!tsInFile)
    {
        perror("Could not open TS input file");
        exit(2);
    }

    unsigned int tsPktNum = 0;
    uint8_t      tsBuffer[256];
    AVPacket     tsPkt;
    av_init_packet(&tsPkt);

    VideoDecoder vDecoder;

    while (!feof(tsInFile))
    {
        tsPktNum++;

        tsPkt.size = 188;
        tsPkt.data = tsBuffer;
        fread(tsPkt.data, 188, 1, tsInFile);

        vDecoder.rcvTsPacket(tsPkt);
    }
}

Incoming packets will have a stream ID, for audio, video, subtitles and data. Once you've determined and created a codec context for a video stream, all you need to do is to pass the packets to you own decode function. The best source of information is the source code to ffplay... — WLGfx
Thanks. The TS packets are already constrained to a single PID, containing only the video. It is H.264 so I used AV_CODEC_ID_H264 as the decoder. When you say "pass the packets", which ones? complete TS packets or re-assembled PES packets? I'll check out ffplay. — Danny

WLGfx WLGfx · Accepted Answer · 2016-11-25T08:51:29

I've got some code snippets that might help you out as I've been working with MPEG-TS also.

Starting with my packet thread which checks each packet against the stream ID's which I've already found and got the codec contexts:

void *FFMPEG::thread_packet_function(void *arg) {
    FFMPEG *ffmpeg = (FFMPEG*)arg;
    for (int c = 0; c < MAX_PACKETS; c++)
        ffmpeg->free_packets[c] = &ffmpeg->packet_list[c];
    ffmpeg->packet_pos = MAX_PACKETS;

    Audio.start_decoding();
    Video.start_decoding();
    Subtitle.start_decoding();

    while (!ffmpeg->thread_quit) {
        if (ffmpeg->packet_pos != 0 &&
                Audio.okay_add_packet() &&
                Video.okay_add_packet() &&
                Subtitle.okay_add_packet()) {

            pthread_mutex_lock(&ffmpeg->packet_mutex); // get free packet
            AVPacket *pkt = ffmpeg->free_packets[--ffmpeg->packet_pos]; // pre decrement
            pthread_mutex_unlock(&ffmpeg->packet_mutex);

            if ((av_read_frame(ffmpeg->fContext, pkt)) >= 0) { // success
                int id = pkt->stream_index;
                if (id == ffmpeg->aud_stream.stream_id) Audio.add_packet(pkt);
                else if (id == ffmpeg->vid_stream.stream_id) Video.add_packet(pkt);
                else if (id == ffmpeg->sub_stream.stream_id) Subtitle.add_packet(pkt);
                else { // unknown packet
                    av_packet_unref(pkt);

                    pthread_mutex_lock(&ffmpeg->packet_mutex); // put packet back
                    ffmpeg->free_packets[ffmpeg->packet_pos++] = pkt;
                    pthread_mutex_unlock(&ffmpeg->packet_mutex);

                    //LOGI("Dumping unknown packet, id %d", id);
                }
            } else {
                av_packet_unref(pkt);

                pthread_mutex_lock(&ffmpeg->packet_mutex); // put packet back
                ffmpeg->free_packets[ffmpeg->packet_pos++] = pkt;
                pthread_mutex_unlock(&ffmpeg->packet_mutex);

                //LOGI("No packet read");
            }
        } else { // buffers full so yield
            //LOGI("Packet reader on hold: Audio-%d, Video-%d, Subtitle-%d",
            //  Audio.packet_pos, Video.packet_pos, Subtitle.packet_pos);
            usleep(1000);
            //sched_yield();
        }
    }
    return 0;
}

Each decoder for audio, video and subtitles have their own threads which receive the packets from the above thread in ring buffers. I've had to separate the decoders into their own threads because CPU usage was increasing when I started using the deinterlace filter.

My video decoder reads the packets from the buffers and when it has finished with the packet sends it back to be unref'd and can be used again. Balancing the packet buffers doesn't take that much time once everything is running.

Here's the snipped from my video decoder:

void *VideoManager::decoder(void *arg) {
    LOGI("Video decoder started");
    VideoManager *mgr = (VideoManager *)arg;
    while (!ffmpeg.thread_quit) {
        pthread_mutex_lock(&mgr->packet_mutex);
        if (mgr->packet_pos != 0) {
            // fetch first packet to decode
            AVPacket *pkt = mgr->packets[0];

            // shift list down one
            for (int c = 1; c < mgr->packet_pos; c++) {
                mgr->packets[c-1] = mgr->packets[c];
            }
            mgr->packet_pos--;
            pthread_mutex_unlock(&mgr->packet_mutex); // finished with packets array

            int got;
            AVFrame *frame = ffmpeg.vid_stream.frame;
            avcodec_decode_video2(ffmpeg.vid_stream.context, frame, &got, pkt);
            ffmpeg.finished_with_packet(pkt);
            if (got) {
#ifdef INTERLACE_ALL
                if (!frame->interlaced_frame) mgr->add_av_frame(frame, 0);
                else {
                    if (!mgr->filter_initialised) mgr->init_filter_graph(frame);
                    av_buffersrc_add_frame_flags(mgr->filter_src_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
                    int c = 0;
                    while (true) {
                        AVFrame *filter_frame = ffmpeg.vid_stream.filter_frame;
                        int result = av_buffersink_get_frame(mgr->filter_sink_ctx, filter_frame);
                        if (result == AVERROR(EAGAIN) ||
                                result == AVERROR(AVERROR_EOF) ||
                                result < 0)
                            break;
                        mgr->add_av_frame(filter_frame, c++);
                        av_frame_unref(filter_frame);
                    }
                    //LOGI("Interlaced %d frames, decode %d, playback %d", c, mgr->decode_pos, mgr->playback_pos);
                }
#elif defined(INTERLACE_HALF)
                if (!frame->interlaced_frame) mgr->add_av_frame(frame, 0);
                else {
                    if (!mgr->filter_initialised) mgr->init_filter_graph(frame);
                    av_buffersrc_add_frame_flags(mgr->filter_src_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
                    int c = 0;
                    while (true) {
                        AVFrame *filter_frame = ffmpeg.vid_stream.filter_frame;
                        int result = av_buffersink_get_frame(mgr->filter_sink_ctx, filter_frame);
                        if (result == AVERROR(EAGAIN) ||
                                result == AVERROR(AVERROR_EOF) ||
                                result < 0)
                            break;
                        mgr->add_av_frame(filter_frame, c++);
                        av_frame_unref(filter_frame);
                    }
                    //LOGI("Interlaced %d frames, decode %d, playback %d", c, mgr->decode_pos, mgr->playback_pos);
                }
#else
                mgr->add_av_frame(frame, 0);
#endif
            }
            //LOGI("decoded video packet");
        } else {
            pthread_mutex_unlock(&mgr->packet_mutex);
        }
    }
    LOGI("Video decoder ended");
}

As you can see, I'm using a mutex when passing packets back and forth.

Once a frame has been got I just copy the YUV buffers from the frame for later use into another buffer list. I don't convert the YUV, I use a shader which converts the YUV to RGB on the GPU.

The next snippet adds my decoded frame to my buffer list. This may help understand how to deal with the data.

void VideoManager::add_av_frame(AVFrame *frame, int field_num) {
    int y_linesize = frame->linesize[0];
    int u_linesize = frame->linesize[1];

    int hgt = frame->height;

    int y_buffsize = y_linesize * hgt;
    int u_buffsize = u_linesize * hgt / 2;

    int buffsize = y_buffsize + u_buffsize + u_buffsize;

    VideoBuffer *buffer = &buffers[decode_pos];

    if (ffmpeg.is_network && playback_pos == decode_pos) { // patched 25/10/16 wlgfx
        buffer->used = false;
        if (!buffer->data) buffer->data = (char*)mem.alloc(buffsize);
        if (!buffer->data) {
            LOGI("Dropped frame, allocation error");
            return;
        }
    } else if (playback_pos == decode_pos) {
        LOGI("Dropped frame, ran out of decoder frame buffers");
        return;
    } else if (!buffer->data) {
        buffer->data = (char*)mem.alloc(buffsize);
        if (!buffer->data) {
            LOGI("Dropped frame, allocation error.");
            return;
        }
    }

    buffer->y_frame = buffer->data;
    buffer->u_frame = buffer->y_frame + y_buffsize;
    buffer->v_frame = buffer->y_frame + y_buffsize + u_buffsize;

    buffer->wid = frame->width;
    buffer->hgt = hgt;

    buffer->y_linesize = y_linesize;
    buffer->u_linesize = u_linesize;

    int64_t pts = av_frame_get_best_effort_timestamp(frame);
    buffer->pts = pts;
    buffer->buffer_size = buffsize;

    double field_add = av_q2d(ffmpeg.vid_stream.context->time_base) * field_num;
    buffer->frame_time = av_q2d(ts_stream) * pts + field_add;

    memcpy(buffer->y_frame, frame->data[0], (size_t) (buffer->y_linesize * buffer->hgt));
    memcpy(buffer->u_frame, frame->data[1], (size_t) (buffer->u_linesize * buffer->hgt / 2));
    memcpy(buffer->v_frame, frame->data[2], (size_t) (buffer->u_linesize * buffer->hgt / 2));

    buffer->used = true;
    decode_pos = (++decode_pos) % MAX_VID_BUFFERS;

    //if (field_num == 0) LOGI("Video %.2f, %d - %d",
    //        buffer->frame_time - Audio.pts_start_time, decode_pos, playback_pos);
}

If there's anything else that I may be able to help with just give me a shout. :-)

EDIT:

The snippet how I open my video stream context which automatically determines the codec, whether it is h264, mpeg2, or another:

void FFMPEG::open_video_stream() {
    vid_stream.stream_id = av_find_best_stream(fContext, AVMEDIA_TYPE_VIDEO,
                                                -1, -1, &vid_stream.codec, 0);
    if (vid_stream.stream_id == -1) return;

    vid_stream.context = fContext->streams[vid_stream.stream_id]->codec;

    if (!vid_stream.codec || avcodec_open2(vid_stream.context,
            vid_stream.codec, NULL) < 0) {
        vid_stream.stream_id = -1;
        return;
    }

    vid_stream.frame = av_frame_alloc();
    vid_stream.filter_frame = av_frame_alloc();
}

EDIT2:

This is how I've opened the input stream, whether it be file or URL. The AVFormatContext is the main context for the stream.

bool FFMPEG::start_stream(char *url_, float xtrim, float ytrim, int gain) {
    aud_stream.stream_id = -1;
    vid_stream.stream_id = -1;
    sub_stream.stream_id = -1;

    this->url = url_;
    this->xtrim = xtrim;
    this->ytrim = ytrim;
    Audio.volume = gain;

    Audio.init();
    Video.init();

    fContext = avformat_alloc_context();

    if ((avformat_open_input(&fContext, url_, NULL, NULL)) != 0) {
        stop_stream();
        return false;
    }

    if ((avformat_find_stream_info(fContext, NULL)) < 0) {
        stop_stream();
        return false;
    }

    // network stream will overwrite packets if buffer is full

    is_network =  url.substr(0, 4) == "udp:" ||
                  url.substr(0, 4) == "rtp:" ||
                  url.substr(0, 5) == "rtsp:" ||
                  url.substr(0, 5) == "http:";  // added for wifi broadcasting ability

    // determine if stream is audio only

    is_mp3 = url.substr(url.size() - 4) == ".mp3";

    LOGI("Stream: %s", url_);

    if (!open_audio_stream()) {
        stop_stream();
        return false;
    }

    if (is_mp3) {
        vid_stream.stream_id = -1;
        sub_stream.stream_id = -1;
    } else {
        open_video_stream();
        open_subtitle_stream();

        if (vid_stream.stream_id == -1) { // switch to audio only
            close_subtitle_stream();
            is_mp3 = true;
        }
    }

    LOGI("Audio: %d, Video: %d, Subtitle: %d",
            aud_stream.stream_id,
            vid_stream.stream_id,
            sub_stream.stream_id);

    if (aud_stream.stream_id != -1) {
        LOGD("Audio stream time_base {%d, %d}",
            aud_stream.context->time_base.num,
            aud_stream.context->time_base.den);
    }

    if (vid_stream.stream_id != -1) {
        LOGD("Video stream time_base {%d, %d}",
            vid_stream.context->time_base.num,
            vid_stream.context->time_base.den);
    }

    LOGI("Starting packet and decode threads");

    thread_quit = false;

    pthread_create(&thread_packet, NULL, &FFMPEG::thread_packet_function, this);

    Display.set_overlay_timout(3.0);

    return true;
}

EDIT: (constructing an AVPacket)

Construct an AVPacket to send to the decoder...

AVPacket packet;
av_init_packet(&packet);
packet.data = myTSpacketdata; // pointer to the TS packet
packet.size = 188;

You should be able to reuse the packet. And it might need unref'ing.

What to pass to avcodec_decode_video2 for H.264 Transport Stream?

2 Answers