I'm using cameras with an H.264 stream over RTSP decoded with avcodec. For most of the camera, each packet (NAL unit) received represent a complete frame (I Frame or frame) and when i decode it i obtain a frame each time. But for another camera, a frame is split in many NAL unit of constant size, when i decode each packet, i don't got a frame for each packet.
I saw that there is a start and end flags in NAL unit. The end flags is never set, except for PPS or SPS. Nevertheless, I can detect start code, and tell every frame end when a new frame begin.
I would like to buffer every NAL unit, within a single frame before to send it to the decoder (this for recording feature and to minimize frame indexing).
Here an example (start_flags is 128 within NAL[1] byte)
NALU: 10 bytes: SPS, NAL[1]={0,64,0,2} // Start Frame 1
NALU: 5 bytes: PPS, NAL[1]={128,64,0,14}
NALU: 551 bytes: I-Frame, NAL[1]={128,0,0,8}
NALU: 531 bytes: I-Frame, NAL[1]={0,0,0,9}
NALU: 532 bytes: I-Frame, NAL[1]={0,0,0,4}
NALU: 517 bytes: I-Frame, NAL[1]={0,0,0,7}
NALU: 533 bytes: I-Frame, NAL[1]={0,0,0,3}
NALU: 621 bytes: I-Frame, NAL[1]={0,0,0,3}
NALU: 586 bytes: I-Frame, NAL[1]={0,0,0,1}
NALU: 520 bytes: I-Frame, NAL[1]={0,0,0,1}
NALU: 507 bytes: I-Frame, NAL[1]={0,0,0,1}
NALU: 508 bytes: I-Frame, NAL[1]={0,0,0,1}
NALU: 531 bytes: I-Frame, NAL[1]={0,0,0,0}
NALU: 558 bytes: I-Frame, NAL[1]={0,0,0,0}
NALU: 49 bytes: I-Frame, NAL[1]={0,0,0,0} // Start Frame 2 + END Frame 1
NALU: 253 bytes: Frame, NAL[1]={128,0,0,26} // Start Frame 3 + END Frame 2
// Frame 2 start so we can record/decode Frame 1
NALU: 510 bytes: Frame, NAL[1]={128,0,0,26}
// Frame 3 start so we can record/decode Frame 2
NALU: 506 bytes: Frame, NAL[1]={0,0,0,1}
NALU: 267 bytes: Frame, NAL[1]={0,0,0,0} // Start Frame 4 + END Frame 3
NALU: 535 bytes: Frame, NAL[1]={128,0,0,26}
// Frame 4 start so we can record/decode Frame 3
NALU: 527 bytes: Frame, NAL[1]={0,0,0,4}
NALU: 509 bytes: Frame, NAL[1]={0,0,0,3}
NALU: 508 bytes: Frame, NAL[1]={0,0,0,1}
NALU: 519 bytes: Frame, NAL[1]={0,0,0,0}
NALU: 327 bytes: Frame, NAL[1]={0,0,0,0} // END Frame 4
...
However, it seems i got some troubles with some streams. For stream where each NAL unit represent a frame, it seems if i decode frame only when the next start, the RTSP stream seems to drop some I-Frame. I think it's a synchronization problem due to the maybe due decoding time because the problem doesn't occurs is i decode frame when directly received.
Here the detail when I decode directly (every thing works correctly):
NALU: 24 bytes: SPS, NAL[1]={0,64,0,13} // Start Frame 1
NALU: 4 bytes: PPS, NAL[1]={128,64,32,14}
NALU: 176124 bytes: Frame, NAL[1]={128,0,0,8}
// Decode Frame 1 OK
NALU: 24 bytes: SPS, NAL[1]={0,64,0,13} // Start Frame 2
NALU: 4 bytes: PPS, NAL[1]={128,64,32,14}
NALU: 175605 bytes: I-Frame, NAL[1]={128,0,0,8}
// Decode Frame 2 OK
NALU: 38777 bytes: Frame, NAL[1]={128,0,0,26} // Start Frame 3
// Decode Frame 3 OK
NALU: 32188 bytes: Frame, NAL[1]={128,0,0,26} // Start Frame 4
// Decode Frame 4 OK
NALU: 24 bytes: SPS, NAL[1]={0,64,0,13} // Start Frame 5
NALU: 4 bytes: PPS, NAL[1]={128,64,32,14}
NALU: 175975 bytes: I-Frame, NAL[1]={128,0,0,8}
// Decode Frame 5 OK
NALU: 41681 bytes: Frame, NAL[1]={128,0,0,26} // Start Frame 6
// Decode Frame 6 OK
Here the detail when I decode after each frame start (some frame are not decoded):
NALU: 24 bytes: NAL[0]={0,3,7}, NAL[1]={0,64,0,13} // Start frame 1
NALU: 4 bytes: NAL[0]={0,3,8}, NAL[1]={128,64,32,14}
NALU: 177827 bytes: NAL[0]={0,3,1}, NAL[1]={128,0,0,8}
NALU: 24 bytes: NAL[0]={0,3,7}, NAL[1]={0,64,0,13} // Start frame 2 + End frame 1
// Decode Frame 1 OK
NALU: 4 bytes: NAL[0]={0,3,8}, NAL[1]={128,64,32,14}
NALU: 43304 bytes: NAL[0]={0,3,1}, NAL[1]={128,0,0,26}
NALU: 39115 bytes: NAL[0]={0,3,1}, NAL[1]={128,0,0,26} // Start frame 3 + End frame 2
// Decode Frame 2 OK
NALU: 24 bytes: NAL[0]={0,3,7}, NAL[1]={0,64,0,13} // Start frame 4 + End frame 3
// Decode Frame 3 OK
NALU: 4 bytes: NAL[0]={0,3,8}, NAL[1]={128,64,32,14}
NALU: 49200 bytes: NAL[0]={0,3,1}, NAL[1]={128,0,0,26}
NALU: 41002 bytes: NAL[0]={0,3,1}, NAL[1]={128,0,0,26} // Start frame 5 + End frame 4
// Decode Frame 4 failed
NALU: 39581 bytes: NAL[0]={0,3,1}, NAL[1]={128,0,0,26}
// Decode Frame 5 failed
It's like if some frame are dropped by the RTSP stream (I-Frame)
So my questions are:
- Do you think RTPS drop some frames?
- Is H.264 decoder expect frame arriving within a delay to be decoded properly, respecting some timecode or something like that?
- How can I detect the NAL unit is the last of the picture, instead of waiting the start of the next one.
Thank you for your help