12
votes

I am trying to decode a mpeg video file using LibAV. There are two terms which I am not able to grok properly, Frames and Packets.

As per my present understanding, Frames are uncompressed video frames and packets are the compressed frames.

Questions :

  • Packet has multiple frames, right?
  • Can a frame be only part of one Packet? I refer to the case where a half of the frame information is in packet1 and another half in packet2? Is it possible?
  • How will we know how many frames are in a packet in LibAV?
2
have you found answers?404pio

2 Answers

4
votes

To answer your first and third questions:

  • according to the doc for the AVPacket class: "For video, it should typically contain one compressed frame. For audio it may contain several compressed frames.
  • the decode video example gives this code that reads all frames within a packet; you can also use it to count the frames:
static void decode(AVCodecContext *dec_ctx, AVFrame *frame, AVPacket *pkt,
                   const char *filename)
{
    char buf[1024];
    int ret;
    ret = avcodec_send_packet(dec_ctx, pkt);
    if (ret < 0) {
        fprintf(stderr, "Error sending a packet for decoding\n");
        exit(1);
    }
    while (ret >= 0) {
        ret = avcodec_receive_frame(dec_ctx, frame);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
            return;
        else if (ret < 0) {
            fprintf(stderr, "Error during decoding\n");
            exit(1);
        }
        printf("saving frame %3d\n", dec_ctx->frame_number);
        fflush(stdout);
        /* the picture is allocated by the decoder. no need to
           free it */
        snprintf(buf, sizeof(buf), filename, dec_ctx->frame_number);
        pgm_save(frame->data[0], frame->linesize[0],
                 frame->width, frame->height, buf);
    }
}
2
votes

Simply put, a packet is a block of data.

This is generally determined by bandwidth. If the device has limited internet speeds, or a phone with a choppy signal, then packetsize will be smaller. If it's a desktop with dedicated service, packetsize could be quite a bit larger.

A frame could be thought of as one cell of animation, but typically these days, due to compression, it's not an actual keyframe image, but simply the changes since the last entire keyframe. They'll send one keyframe, an actual image once every few seconds or so, but every frame in-between is just a blending of data that specifies which pixels have changed since the last image, the delta.

So yea, let's say your packetsize is 1024 bytes, then your resolution will be limited to however many pixels that stream can carry the changes for. They might send one-frame-per-packet to keep it simple, but I don't think there's anything that absolutely guarantees that, as the datastream is reconstructed from those packets, often out of order, and then the frame deltas are generated once all those packets are pieced together.

Audio takes up much less space than video, so they might only need to send one audio packet for every 50 video packets.

I know these guys did a few clips on video-streams being recombined from packets, on their channel -- https://www.youtube.com/watch?v=DkIhI59ysXI