H.264 conversion with FFmpeg (from a RTP stream)

Question

Environment:

I have an IP Camera, which is capable of streaming it's data over RTP in a H.264 encoded format. This raw stream is recorded from the ethernet. With that data I have to work.

Goal:

In the end I want to have a *.mp4 file, which I can play with common Media Players (like VLC or Windows MP).

What have I done so far:

I take that raw stream data I have and parse it. Since the data has been transmitted via RTP I need to take care of the NAL Bytes, SPS and PPS.

1. Write a raw file

First I determine the type of each frame received over Ethernet. To do so, I parse the first two bytes of every RTP Payload, so I can get the 8 NAL Unit Bit, the Fragment Type Bits and the Start, Reserved and End Bit. In the payload, they're arranged like this:

Byte 1: [          3 NAL Unit Bits          | 5 Fragment Type Bits]
Byte 2: [Start Bit | Reserved Bit | End Bit | 5 NAL Unit Bits]

From this I can determine:

Start and End of a Video Frame -> Start Bit and End Bit
Type of the Payload -> 5 Fragment Type Bits
NAL Unit Byte

The Fragment types which are necessary in my case are:

Fragment Type  7 = SPS
Fragment Type  8 = PPS
Fragment Type 28 = Video Fragment

The NAL Byte is created by putting the NAL Unit Bits from Byte 1 and 2 together.

Now depending on the fragmentation type I do the following:

SPS/PPS:

Write the NAL Prefix (0x00 0x00 0x01) and then the SPS or PPS data

Fragmentation with Start Bit

Write NAL Prefix
Write NAL Unit Byte
Write remaining raw data

Fragmentation without Start Bit

Write raw data

This means my raw file looks something like this:

[NAL Prefix][SPS][NAL Prefix][PPS][NAL Prefix][NAL Unit Byte][Raw Video Data][Raw Video Data]....[NAL Prefix][NAL Unit Byte][Raw Video Data]...

For every PPS and SPS I find in my stream data, I just write a NAL Prefix ( 0x00 0x00 0x01 ) and then the SPS/PPS itself.

Now I can't play this data with some media player, which leads me to :

2. Convert the file

Since I wanted to avoid working much with codecs I just went to use an existing application -> FFmpeg. This I am calling with those parameters:

ffmpeg.exe -f h264 -i <RawInputFile> -vcodec copy -r 25 <OutPutFilename>.mp4

-f h264: This should tell ffmpeg I have a h264 coded stream

-vcodec copy: Quote from the manpage:

Force video codec to codec. Use the "copy" special value to tell that the raw codec data must be copied as is.

-r 25: Sets the framerate to 25 FPS.

When I call ffmpeg with those parameters I get an .mp4 File, which I can play with VLC and Windows MP, so it actually works. But the file now looks a bit different from my raw file.

This leads me to my question:

What did I actually do?

My problem is not that it is not working. I just want/need to know what I have actually done with calling ffmpeg. I had a raw H264 file which I could not play. After using FFmpeg I can play it.

There are the following differences between the original raw file (which I have written) and the one written by FFmpeg:

Header: The FFmpeg File has like about 0x30 Bytes of Header
Footer: The FFmpeg File also has a footer
Changed Prefix and 2 new Bytes:

While a new Video Frame from the Raw File started like [NAL Prefix][NAL Unit Byte][Raw Video Data] in the new file it looks like this:

[0x00 0x00][2 "Random" Bytes][NAL Unit Byte][Raw Video Data].....[0x00 0x00[2 other "Random" Bytes][NAL Unit Byte][Raw Video Data]...

I understand that the Video Stream needs a container format (correct me if I am wrong but I assume that the new header and footer are responsible for that). But why does it change actually some Bytes in the raw data? It can't be some decoding since the stream itself should get decoded by the player and not ffmpeg.

As you can see I don't need a new solution for my problem as far more an explanation (so I can explain it by myself). What does ffmpeg actually do? And why does it change some bytes within the video data?

Were you able to achieve this? If yes, are you willing to share the solution? Thanks! — ioan ghip
I am also looking for similar solution. Could you solve this, would you like to share. — Austin
I know it's super old question but it seems you have wrong order when analyzing Byte 2 of payload. It should be [Start Bit | End Bit | Reserved Bit | 5 NAL Unit Bits] - So you have re-ordered Reserved Bit with End Bit — Adam Szmyd

micha137 micha137 · Accepted Answer · 2013-08-28T11:55:46

Besides adding the MP4 container, ffmpeg converted your H.264 Annex B byte stream (with NAL prefixes) to a length prefixed format.

Your [0x00 0x00][2 "Random" Bytes] is a 32 bit integer, giving the length of the following NAL unit in bytes.

H.264 conversion with FFmpeg (from a RTP stream)

3 Answers