1
votes

There are two types of stream types supported or at-least recommended by the ITU-T H264 documentation i.e RTP packet and Annex B (Raw Byte Sequence).

My question here is that lets assume that the encoder is capable of sending streaming data in both the formats and can switch between anyone of them at any point of time while streaming(correct if this is not the case), how and when does the H264 decoder comes to know that it needs to parse the data either according to RTP format or Annex B i.e Raw Byte Sequence data.

Is there any standard protocol or mechanism to do that.

What will happen in case there is packet loss and the encoder switches the way it was streaming data i.e either from RTP to Annex B or vice versa, here the decoder probably still assumes the data being streamed in old format.

Kindly clarify the above.

1
Mixing data formats is asking for trouble.. if there was a packet loss the best error protection is to re-send packet or ignore that corrupt data and jump to next packets data if usable (viewer sees momentary picture freeze or frame jump)VC.One
Annex B is usually found in stored files (like MP4) and RTP is for real-time broadcasting. Just saying how it would be akward to switch vice versa anytime im mid-stream. The formats are for solving different problemsVC.One
@VC.One Lets assume below scenarios and also I am not sure why an live streaming can't be send in RBSP foramat. Scenario : Encoder is sending RTP packets , loss occurs and the streaming of RTP starts again here the lost packets are ignored by the decoder and continues with the subsequent packets (but doesn't the decoder needs to sync up with encoder on what type of stream is being sent RTP or RBSP). And also I am not sure whether there is any constraint on the encoder to not switch the stream types either on loss or on the fly.Krishna Oza
You are over-complicating it. The decoder's functions just want h.264 data and they give out relevant pixels to display. Everything else is just for "containing" or "transporting" the data. There is no clever benefit in switching stream types ( = writing more code to handle the sudden difference in bytes structure) before extracting h.264 to pass to the decoder. There is no need to sync decoder with encoder. If i encode a video today how does it sync when you load (decode) the file one week later?VC.One
@VC.One I agree that the sync is not required when the recorded streams are being played but the problem still persists of sync in case of live streaming. Also consider a case when Annex B i.e. RBSP stream is being streamed by encoder and the decoder is decoding the same and suddenly there is a packet loss and the decoder device shuts downs and restart itself. Now in between these times the encoder is still streaming RBSP and now how the decoder will come to know what is the stream type (RBSP or RTP) . How the decoder synchronizes itself and how it comes to know that the when a frame startedKrishna Oza

1 Answers

4
votes

Generally, most of the cases, H264 encoders produce packets in NAL (Netwrok Abstraction Layer) form. Each NALU (NAL Unit) consists of a NAL-Header and RBSP (Raw Byte Sequence Payload). As similar to H264 encoders, most of the decoders are capable of understanding the NALU (not really RTP). NAL header is 1-byte in size.

There are 2 RTP packetization methods for NAL Units. In one method, NAL fragmentation is allowed and other method doesn't allow fragmenting the NALU. In both methods, RTP header is followed by NALU. Suppose both encoders and decoders are implemented in a way to understand RTP header as well, then they should parse the header first as the headers are always fixed in size. Then, check against RTP and/or NAL headers to treat it accordingly for further parsing.

For more details, see RFC 6184 - RTP Payload Format for H.264 Video

In summary, RTP and NAL are just headers and it's about the method to parse RTP or NAL header before decoding the actual video data. It is better to signal the mode (RTP or NAL) in which the data got streamed to decoder. That makes decoder life easy to avoid mistake of treating any packet wrongly.

In case of packet loss, it is all about decoder resiliency approaches. There is no standardized approach for packet (NALU) loss. Some decoders do provide error concealment for packet loss scenarios.

More Details Added:

You need to have both header (RTP & NAL) parsing implementations on decoders side. As said above, it is better to have a signalling mechanism to indicate the mode in which the packet is sent to decoder. Since NAL header is subset in a given packet (exists in RTP and NAL), you better search for NAL start code first. Once decoder finds the start code in a packet, check for number of bytes (x) consumed till that point. If x is greater than RTP header size, start parsing in RTP mode from the beginning of the packet. If RTP parsing goes well (by validating some of the RTP fields against data in hand), decoder can conclude that packets are getting received in RTP mode. Above approach is valid for Non-fragmented RTP packetization method.