H.264 over RTP - Identify SPS and PPS Frames

Question

I have a raw H.264 Stream from an IP Camera packed in RTP frames. I want to get raw H.264 data into a file so I can convert it with ffmpeg.

So when I want to write the data into my raw H.264 file I found out it has to look like this:

00 00 01 [SPS] 
00 00 01 [PPS]
00 00 01 [NALByte]
[PAYLOAD RTP Frame 1]     // Payload always without the first 2 Bytes -> NAL
[PAYLOAD RTP Frame 2]
[... until PAYLOAD Frame with Mark Bit received]  // From here its a new Video Frame
00 00 01 [NAL BYTE]
[PAYLOAD RTP Frame 1]
....

So I get the SPS and the PPS from the Session Description Protocol out of my preceding RTSP communication. Additionally the camera sends the SPS and the PPSin two single messages before starting with the video stream itself.

So I capture the messages in this order:

1. Preceding RTSP Communication here ( including SDP with SPS and PPS )
2. RTP Frame with Payload: 67 42 80 28 DA 01 40 16 C4    // This is the SPS 
3. RTP Frame with Payload: 68 CE 3C 80                   // This is the PPS
4. RTP Frame with Payload: ...  // Video Data

Then there come some Frames with Payload and at some point a RTP Frame with the Marker Bit = 1. This means ( if I got it right) that I have a complete video frame. Afer this I write the Prefix Sequence ( 00 00 01 ) and the NALfrom the payload again and go on with the same procedure.

Now my camera sends me after every 8 complete Video Frames the SPS and the PPS again. ( Again in two RTP Frames, as seen in the example above ). I know that especially the PPS can change in between streaming but that's not the problem.

My questions are now:

1. Do I need to write the SPS/PPS every 8th Video Frame?

If my SPS and my PPS don't change it should be enough to have them written at the very beginning of my file and nothing more?

2. How to distinguish between SPS/PPS and normal RTP Frames?

In my C++ Code which parses the transmitted data I need make a difference between the RTP Frames with normal Payload an the ones carrying the SPS/PPS. How can I distinguish them? Okay the SPS/PPS frames are usually way smaller, but that's not a save call to rely on. Because if I ignore them I need to know which data I can throw away, or if I need to write them I need to put the 00 00 01 Prefix in front of them. ? Or is it a fixed rule that they occur every 8th Video Frame?

Thanks for this question. I have the same question as you. I read through live555 source code and do not know why they save each packet/frame like that. After reading this thread, things become clear to me. As a suggestion based on live555 implementation, the marker bit is only used in other codec, H264 has its own start_bit and end_bit to represent for start/end of frame, marker bit is not used for H264. — user534498

ciphor ciphor · Accepted Answer · 2012-03-08T13:33:42

If the SPS and PPS do not change, you could omit them except the 1st ones.
You need to parse the nal_unit_type field of each NAL, for SPS, nal_unit_type==7; for PPS, nal_unit_type==8.

As I remember, nal_unit_type is the lower 5 bits of the 1st byte of a frame.

nal_unit_type = frame[0] & 0x1f;

H.264 over RTP - Identify SPS and PPS Frames

2 Answers