15
votes

I have a raw H.264 Stream from an IP Camera packed in RTP frames. I want to get raw H.264 data into a file so I can convert it with ffmpeg.

So when I want to write the data into my raw H.264 file I found out it has to look like this:

00 00 01 [SPS] 
00 00 01 [PPS]
00 00 01 [NALByte]
[PAYLOAD RTP Frame 1]     // Payload always without the first 2 Bytes -> NAL
[PAYLOAD RTP Frame 2]
[... until PAYLOAD Frame with Mark Bit received]  // From here its a new Video Frame
00 00 01 [NAL BYTE]
[PAYLOAD RTP Frame 1]
....

So I get the SPS and the PPS from the Session Description Protocol out of my preceding RTSP communication. Additionally the camera sends the SPS and the PPSin two single messages before starting with the video stream itself.

So I capture the messages in this order:

1. Preceding RTSP Communication here ( including SDP with SPS and PPS )
2. RTP Frame with Payload: 67 42 80 28 DA 01 40 16 C4    // This is the SPS 
3. RTP Frame with Payload: 68 CE 3C 80                   // This is the PPS
4. RTP Frame with Payload: ...  // Video Data

Then there come some Frames with Payload and at some point a RTP Frame with the Marker Bit = 1. This means ( if I got it right) that I have a complete video frame. Afer this I write the Prefix Sequence ( 00 00 01 ) and the NALfrom the payload again and go on with the same procedure.

Now my camera sends me after every 8 complete Video Frames the SPS and the PPS again. ( Again in two RTP Frames, as seen in the example above ). I know that especially the PPS can change in between streaming but that's not the problem.

My questions are now:

1. Do I need to write the SPS/PPS every 8th Video Frame?

If my SPS and my PPS don't change it should be enough to have them written at the very beginning of my file and nothing more?

2. How to distinguish between SPS/PPS and normal RTP Frames?

In my C++ Code which parses the transmitted data I need make a difference between the RTP Frames with normal Payload an the ones carrying the SPS/PPS. How can I distinguish them? Okay the SPS/PPS frames are usually way smaller, but that's not a save call to rely on. Because if I ignore them I need to know which data I can throw away, or if I need to write them I need to put the 00 00 01 Prefix in front of them. ? Or is it a fixed rule that they occur every 8th Video Frame?

2
Thanks for this question. I have the same question as you. I read through live555 source code and do not know why they save each packet/frame like that. After reading this thread, things become clear to me. As a suggestion based on live555 implementation, the marker bit is only used in other codec, H264 has its own start_bit and end_bit to represent for start/end of frame, marker bit is not used for H264.user534498

2 Answers

14
votes
  1. If the SPS and PPS do not change, you could omit them except the 1st ones.
  2. You need to parse the nal_unit_type field of each NAL, for SPS, nal_unit_type==7; for PPS, nal_unit_type==8.

As I remember, nal_unit_type is the lower 5 bits of the 1st byte of a frame.

nal_unit_type = frame[0] & 0x1f;
12
votes
  1. You should write SPS and PPS at the start of stream, and only when they change in the middle of stream.

  2. SPS and PPS frames are packed in a STAP NAL unit (generally STAP-A) with NAL type 24 (STAP-A) or 25 (STAP-B) STAP format is described in RFC-3984 section 5.7.1

  3. Don't rely on marker bit, use start bit and end bit in NAL header.

  4. For fragmented video frames you should regenerate NAL unit using 3 NAL unit bits of first fragment (F, NRI) combined with 5 NAL type bits of first byte in payload (only for packets with start bit set to 1) check RFC-3984 section 5.8:

    The NAL unit type octet of the fragmented NAL unit is not included as such in the fragmentation unit payload, but rather the information of the NAL unit type octet of the fragmented NAL unit is conveyed in F and NRI fields of the FU indicator octet of the fragmentation unit and in the type field of the FU header.

EDIT: more explanation about NAL unit construction for fragmentation units:

this is first two bytes of a FU-A payload (right after rtp header):

|  FU indicator |   FU header   |
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |S|E|R|  Type   |
+---------------+---------------+

to construct the NAL unit you should take "Type" from "FU Header" and "F" and "NRI" from "FU indicator"

here is a simple implementation