Parsing MPEG4 Frames from rtp packets

Question

I am trying to parse different mpeg4 frames from an rtp stream coming from an axis camera, and feed the packets to ffmpeg library using avcodec_decode_video function. here are the steps i am doing 1. rtsp stream is initialized 2. rtp stream starts flowing in 3. First packet i am getting starts with 000001b0... and the configuration data follows and after that frame starts with 000001b6.. second rtp payload will be different, till i get a rtp packet where the marker bit is set. after than again i get packet starting with 000001b6 and goes on around 5-10 rtp packets.. this pattern repeast

what i am doing is if i detect 000001b0/b6 - i will accumulate all the packets coming after than and feed the bigger buffer to the avcodec_decode_video function of libavcodec, after initializing the decoder context properly.

But i am getting a crappy picture here, with the top most portion , a horizontal bar - crystal clear picture and the rest is crappy. I am not sure why it is behaving like this. Please help me

The data i am getting in rtp packet is dynamic-96.

point to note : when i am passing the iframes and p frames which is engrossed in the propreitary protocol of some other manufacturer the ffmpeg is able parse and give very good pcitures.

Any help is appreciated

Cipi Cipi · Accepted Answer · 2011-01-26T09:29:03

Try to fiddle with your MPEG4 stream settings on AXIS IP camera. Pay attention on Video & Image/Advanced part where you should set this:

Video Object Type: SIMPLE
[x] ISMA compliant
GOV Structure: IP

Also, try to change "Priority" or "Optimize video stream for" setting (you should have frame rate, image quality, bandwidth, none).

If none of this works, then read more...

I hope that you understand how the MPEG4 stream is transmitted over RTP. In short (if you are not sure how):

"Configuration frame" (Visual Object Sequence Start) starts with an integer 000001B0(hex). It contains the data needed for a video to be decoded. You need to send it to decoder only the first time you are trying to decode a stream, and it is used to decode all VOPs that come after it. Note that AXIS sends this packet in SDP (response to DESCRIBE in RTSP) for example: a=fmtp:96 profile-level-id=245; config=000001B0F5000001B5891300000100000001200086C40FA28A021E0A21. So if the stream never changes, and you are getting this in SDP, you dont need to pass VOS to the decoder... but if you do, there is no harm.
Video Object Plane (I-VOP, P-VOP, B-VOP) starts with an integer 000001B6. If you set GOV length to be 10, and structure of the stream to "IP" you will get 1 I-Frame (I-VOP) and 9 P-VOP-s, but all will have 000001B6 starting code. The trick to differentiate between them is to check next two BITS in the FIFTH byte. Check the table to determine the type of VOP you are getting:
```
VOP_CODING_TYPE (binary)  Coding method
                      00  intra-coded (I)
                      01  predictive-coded (P)
                      10  bidirectionally-predictive-coded (B)
                      11  sprite (S)
```

Now, to decode video you must have VOS sent to decoder, immediately followed by an I-VOP. But fist, your way of extracting this frames from RTP stream is awkward... If I-VOP is 10000B in size, and your network MTU is 1400B, you can't sent it as it is and not have a network congestion. So AXIS camera splits I-VOPs and all other BIG frames into FRAGMENTS that it sends over RTP as RTP packets which size doesn't exceed MTU. Main idea is this (example):

Split 10000B into MTU sized fragments (for 1400B MTU you get roughly 8x[1400B] and 1x[200B] fragments)
Send each one with RTP MARKER BIT set to 0
Send last fragment with RTP MARKER BIT set to 1 to mark the last fragment

Now, when you are receiving this, you kinda get the idea, but you need to restore the original 10KB FRAME in order for decoder to decode it. The way you are doing, you are only decoding the first MTU bytes of much larger frame, and all other fragments that you send to decoder are discarded. That's why you can get the shitty picture...

To restore original frame:

Receive the packet with the start code 000001B6 or 000001B0 and RTP MARKER bit set to 0. If the MARKER is set to 1, that is the whole frame, and you can decode it as it is! If it is 0, more parts follow...
Place the first fragment all the fragments that follow into a buffer, until you get one with MARKER BIT set to 1. When you get the last fragment, place it into a buffer.
Your buffer now contains one whole frame that you can send to decoder!

There, hope I helped... :)

Parsing MPEG4 Frames from rtp packets

1 Answers