I want to know once and for all, how time base calucaltion and rescaling works in FFMPEG. Before getting to this question I did some research and found many controversial answers, which make it even more confusing. So based on official FFMPEG examples one has to
rescale output packet timestamp values from codec to stream timebase
with something like this:
pkt->pts = av_rescale_q_rnd(pkt->pts, *time_base, st->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
pkt->dts = av_rescale_q_rnd(pkt->dts, *time_base, st->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
pkt->duration = av_rescale_q(pkt->duration, *time_base, st->time_base);
But in this question a guy was asking similar question to mine, and he gave more examples, each of them doing it differently. And contrary to the answer which says that all those ways are fine, for me only the following approach works:
frame->pts += av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
In my application I am generating video packets (h264) at 60 fps outside FFMPEG API then write them into mp4 container.
I set explicitly:
video_st->time_base = {1,60};
video_st->r_frame_rate = {60,1};
video_st->codec->time_base = {1 ,60};
The first weird thing I see happens right after I have written header for the output format context:
AVDictionary *opts = nullptr;
int ret = avformat_write_header(mOutputFormatContext, &opts);
av_dict_free(&opts);
After that ,video_st->time_base
is populated with:
num = 1;
den = 15360
And I fail to understand why.
I want someone please to exaplain me that.Next, before writing frame I calculate PTS for the packet. In my case PTS = DTS as I don't use B-frames at all.
And I have to do this:
const int64_t duration = av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
totalPTS += duration; //totalPTS is global variable
packet->pts = totalPTS ;
packet->dts = totalPTS ;
av_write_frame(mOutputFormatContext, mpacket);
I don't get it,why codec and stream have different time_base values even though I explicitly set those to be the same. And because I see across all the examples that av_rescale_q
is always used to calculate duration I really want someone to explain this point.
Additionally, as a comparison, and for the sake of experiment, I decided to try writing stream for WEBM container. So I don't use libav output stream at all. I just grab the same packet I use to encode MP4 and write it manually into EBML stream. In this case I calculate duration like this:
const int64_t duration =
( video_st->codec->time_base.num / video_st->codec->time_base.den) * 1000;
Multiplication by 1000 is required for WEBM as the time stamps are presented in milliseconds in that container.And this works. So why in case of MP4 stream encoding there is a difference in time_base which has to be rescaled?