5
votes

I'm grabbing video frames from the camera via v4l, and i need to transcode them in mpeg4 format to successively stream them via RTP.

Everything actually "works" but there's something I don't while re-encoding: the input stream produces 15fps, while the output is at 25fps, and every input frame is converted in one single video object sequence (i verified this with a simple check on the output bitstream). I guess that the receiver is correctly parsing the mpeg4 bitstream but the RTP packetization is somehow wrong. How am I supposed to split the encoded bitstream in one or more AVPacket ? Maybe I'm missing the obvious and I just need to look for B/P frame markers, but I think I'm not using the encode API correctly.

Here is an excerpt of my code, that is based on the available ffmpeg samples:

// input frame
AVFrame *picture;
// input frame color-space converted
AVFrame *planar;
// input format context, video4linux2
AVFormatContext *iFmtCtx;
// output codec context, mpeg4
AVCodecContext *oCtx;
// [ init everything ]
// ...
oCtx->time_base.num = 1;
oCtx->time_base.den = 25;
oCtx->gop_size = 10;
oCtx->max_b_frames = 1;
oCtx->bit_rate = 384000;
oCtx->pix_fmt = PIX_FMT_YUV420P;

for(;;)
{
  // read frame
  rdRes = av_read_frame( iFmtCtx, &pkt );
  if ( rdRes >= 0 && pkt.size > 0 )
  {
    // decode it
    iCdcCtx->reordered_opaque = pkt.pts;
    int decodeRes = avcodec_decode_video2( iCdcCtx, picture, &gotPicture, &pkt );
    if ( decodeRes >= 0 && gotPicture )
    {
      // scale / convert color space
      avpicture_fill((AVPicture *)planar, planarBuf.get(), oCtx->pix_fmt, oCtx->width, oCtx->height);
      sws_scale(sws, picture->data, picture->linesize, 0, iCdcCtx->height, planar->data, planar->linesize);
      // encode
      ByteArray encBuf( 65536 );
      int encSize = avcodec_encode_video( oCtx, encBuf.get(), encBuf.size(), planar );
      // this happens every GOP end
      while( encSize == 0 )
        encSize = avcodec_encode_video( oCtx, encBuf.get(), encBuf.size(), 0 );
      // send the transcoded bitstream with the result PTS
      if ( encSize > 0 )
        enqueueFrame( oCtx->coded_frame->pts, encBuf.get(), encSize );
    }
  }
}
1

1 Answers

0
votes

Most simple solution would be to use two threads. First thread would do all the things outlined in your question (decoding, scaling / color-space conversion, coding). Partially transcoded frames would be written to intermediate queue shared with second thread. Maximum length of this queue would be in this particular case (converting from lower to higher bitrate) 1 frame. Second thread would be reading in loop frames from input queue like this:

void FpsConverter::ThreadProc()
{

timeBeginPeriod(1);
DWORD start_time = timeGetTime();
int frame_counter = 0;
while(!shouldFinish()) {
    Frame *frame = NULL;
    DWORD time_begin = timeGetTime();
    ReadInputFrame(frame);
    WriteToOutputQueue(frame);
    DWORD time_end = timeGetTime();
    DWORD next_frame_time = start_time + ++frame_counter * frame_time;
    DWORD time_to_sleep = next_frame_time - time_end;
    if (time_to_sleep > 0) {
        Sleep(time_to_sleep);
    }
}
timeEndPeriod(1);
}

When CPU power is sufficient and higher fidelity and smoothness is required you could compute output frame not just from one frame but more frames by some sort of interpolation (similar to techniques used in mpeg codecs). The closer output frame time stamp to input frame time stamp, the more weight you should assign to this particular input frame.