Recently I'm writing a client of a IP camera providing H.264 stream. I'm now using FFmpeg 2.1.1 to decode the stream provided by the camera.
Here's some code of the application:
Initialization:
private unsafe void InitFFmpeg()
{
FFmpeg.avcodec_register_all();
var codec = FFmpeg.avcodec_find_decoder(AVCodecID.AV_CODEC_ID_H264);
avcodec = (IntPtr)codec;
var ctx=FFmpeg.avcodec_alloc_context3(avcodec);
avcontext = (IntPtr)ctx;
ctx->codec = avcodec;
ctx->pix_fmt = AVPixelFormat.PIX_FMT_YUV420P;
ctx->flags2 |= 0x00008000;//CODEC_FLAG2_CHUNKS
var options = IntPtr.Zero;
int result = FFmpeg.avcodec_open2(avcontext, avcodec, ref options);
avframe = FFmpeg.av_frame_alloc();
avparser = FFmpeg.av_parser_init(AVCodecID.AV_CODEC_ID_H264);
FFmpeg.av_init_packet(ref avpacket);
inBuffer = Marshal.AllocHGlobal(300 * 1024);
}
Decoding:
private void Decode(byte[] data, int size)
{
IntPtr pOut = IntPtr.Zero;
int outLen = 0;
Marshal.Copy(data, 0, inBuffer, size);
int gotPicture = 0;
var rs = FFmpeg.av_parser_parse2(avparser, avcontext, ref pOut, ref outLen, inBuffer, size, 0, 0, 0);
if (outLen <= 0 || pOut.ToInt32() <= 0)
{
//no enough data to construct a frame, return and receive next NAL unit.
return;
}
avpacket.data = pOut;
avpacket.size = outLen;
avpacket.flags |= PacketFlags.Key;
var len = FFmpeg.avcodec_decode_video2(avcontext, avframe, ref gotPicture, ref avpacket);
Console.WriteLine("avcodec_decode_video2 returned " + len);
if (gotPicture != 0)
{
//some YUV to RGB stuff
}
}
With the code above, I can get some output like:
NAL unit 1: resolution=1280x720, key-frame=True, size=26.
NAL unit 2: resolution=1280x720, key-frame=False, size=8.
NAL unit 3: resolution=1280x720, key-frame=False, size=97222.
NAL unit 4: resolution=1280x720, key-frame=False, size=14129.
avcodec_decode_video2 returned 1
NAL unit 5: resolution=1280x720, key-frame=False, size=12522.
NAL unit 6: resolution=1280x720, key-frame=False, size=12352.
avcodec_decode_video2 returned 1
NAL unit 7: resolution=1280x720, key-frame=False, size=12291.
NAL unit 8: resolution=1280x720, key-frame=False, size=12182.
From the ouput I can see the parser can recognize the NAL units sent by the camera and can construct frames from them.
NAL unit 1 to 4 are slices of a key frame containing SPS/PPS, and the following 2 NAL units form a normal frame.
And the avcodec_decode_video2 function doesn't produce any error, but just keep returning 1 and gotPicture is alway 0.
If I clear AVCodecContext.flags2, it starts to complain that the packet I provided contains no frame:
NAL unit 100: resolution=1280x720, frame-rate=0, key-frame=True, size=26.
NAL unit 101: resolution=1280x720, frame-rate=0, key-frame=False, size=8.
NAL unit 102: resolution=1280x720, frame-rate=0, key-frame=False, size=96927.
NAL unit 103: resolution=1280x720, frame-rate=0, key-frame=False, size=17149.
[h264 @ 01423440] no frame!
avcodec_decode_video2 returned -1094995529
NAL unit 104: resolution=1280x720, frame-rate=0, key-frame=False, size=12636.
NAL unit 105: resolution=1280x720, frame-rate=0, key-frame=False, size=12338.
[h264 @ 01423440] no frame!
If I write the raw stream to a file, I can use FFmpeg to mux the stream to an mp4 container, and can play the mp4 file with any player.
The raw data I received is something like:
00 00 00 01 67 42 00 28 E9 00 A0 0B 75 C4 80 03 6E E8 00 CD FE 60 0D 88 10 94
00 00 00 01 68 CE 31 52
00 00 00 01 65 88 81 00 06 66 36 25 11 21 2C 04 3B 81 E1 80 00 85 4B 23 9F 71...
...