I have a requirement of receiving RTP stream(H264) and exporting them to a MP4 file. We are using media foundation to export frames received in RTP stream to mp4 file. So we extract H264 frames(from RTP packets) and provide it to sinkwriter through WriteSample API(sample time, sample duration etc are set). This seems to work and I get playable mp4. But when I don't set sample duration(using SetSampleDuration API), writeSample throws error (MF_E_NO_SAMPLE_DURATION) . The error is not thrown for first few frames but only after certain time(frame after 1.48 seconds)a Questions: 1. Why SetSampleDuration is needed?I assumed that we don't need sample duration if we are providing sample time for every sample. Sink writer can calculate difference between the current frame and last frame as sample duration 2. Why the error is not thrown for first few frames by writeSample API .WriteSample throws error only after certain time(frame after 1.48 seconds). Is it specific to certain frames. 3. How to do we ideally set the sample duration when duration between frames is not uniform. In my case the average fps is 15 but time between 2 frames is not uniform. (Timestamp of frames in ms: 0, 83,133, 200, 283,333,400,...) 3.1 To set sample duration of a frame, Wait for next frame and subtract current frame timestamp from next frame timestamp. Should application hold back till next frame is available 3.2 Setting sample duration based on average fps is fine(even though time difference between frames is not uniform). (Note: I tried 3.2 and it works. I can't visually see any issue. This might be because time difference between frames is not uniform but not varying much. But I am not sure if this is ok. Should I go for approach 3.1)
3 Answers
You've got a number of different questions there.
For setting timestamps and durations I'd recommend checking the documentation. It explains about how the timestamp relates to presentation time etc.
For setting the duration you should be using the sampling rate of the source stream. If sampling at the normal 90KHz rate for H264 then at 15fps the each sample has a duration of 66ms. The Media Foundation samples use a timestamp resolution of 100's of nano seconds so the IMFSample duration should be 660000. Your RTP header timestamps should also have a spacing of 6000 at 15ps (6000 x 15 = 90000).
For the timestamp because you're saving it to a file you should start from 0 and each time you get a sample increment by the sample duration of 660000.
Why SetSampleDuration is needed?I assumed that we don't need sample duration if we are providing sample time for every sample. Sink writer can calculate difference between the current frame and last frame as sample duration
Why the error is not thrown for first few frames by writeSample API .WriteSample throws error only after certain time(frame after 1.48 seconds). Is it specific to certain frames.
Samples can have no duration attached, right. But the media sink you are using expects that durations are present, so you have to have them there. Or you cannot use the sink because it would be incompatible.
Your input is enqueued and processed asynchronously. This explains why you don't have the error immediately.
- ... But in the stream the RTP timestamps are 0, 7470, 11970, 18000,.... The spacing is not uniform but overall we get 15 frames in a second. How to set duration in this scenario: 1. Set is as 660000 for all sample or 2.For first sample set duration as (83-0)* 10000 and for second sample set duration as (133-83)*10000
The question is easy and hard at the same time. MP4 format has its own assumptions and expectation when sink creates output file, a track in it, with its own timescale values. Even though it could technically follow your effective timings and keep 90 kHz values there, it is made in assumption that video stream is of fixed frame rate. The sink takes your MF_MT_FRAME_RATE attribute value and derives timings from it. Also, the the behavior even may get stricter or more relaxed with Windows updates because specific behavior is not documented, not promised.
You need some tradeoff in resolving this problem. One option is to realign your frame times to uniform fixed rate time stamp sequence. Another solution could be to indicate higher rate and expect your true frames be mixed with "dropped" frames. You need to try and find what works for you. The MP4 file eventually will have a time scale value and individual frame times will be expressed as increasing integer values against this frequency.
If an encoder, network stream or a capture gives you time/duration, use them, respect them.
It will be the best way to preserve the audio/video presentation.
I can't elaborate here, but you can't tell this :
Sink writer can calculate difference between the current frame and last frame
It is more complicated than this.