I am writing a Smooth Streaming client application. On the server side (IIS 7 with Media Services extensions), I have a bunch of ISMV and ISMA files encoded using Expression Encoder pro 4 with the "H.264 IIS Smooth Streaming iPhone WiFi" preset. In a nutshell, it uses the "H.264 baseline" video codec, and the AAC-LC audio codec.
On the client side however is where I'm having problems, specifically with the audio chunks. While I have been able to make sense of the H.264 video stream (it is essentially a sequence of raw NAL units prefixed by their length, without the NAL unit "start code" 0, 0, 0, 1), I still haven't been able to crack what is inside the AAC LC audio stream, i.e. what comes in the "mdat" (Media Data Box) atom. It is most definitely not an MP4 container, but what is it then?
I am pasting below the first 128 (number chosen arbitrarily) bytes of one AAC-LC fragment (MDAT portion only) obtained from the server, in case anyone can figure it out from there.
unsigned char data[128] = {
0x21, 0x09, 0x0A, 0xBF, 0xBF, 0xFF, 0xFF, 0xD5, 0xB1, 0x8D, 0xC4, 0xA1,
0x18, 0x0D, 0x25, 0xC9, 0x2E, 0x49, 0x2E, 0x10, 0x88, 0x91, 0x10, 0x01,
0x13, 0x23, 0x2C, 0x36, 0x25, 0x60, 0x6B, 0x94, 0x8C, 0x74, 0xD7, 0x4A,
0x95, 0xD3, 0x03, 0x91, 0x5B, 0x76, 0xDE, 0x27, 0xC5, 0xB2, 0x4C, 0xCF,
0xEB, 0x3E, 0xDD, 0xFF, 0x22, 0xAF, 0xC3, 0xF8, 0x60, 0x36, 0x49, 0xBC,
0xAE, 0x4D, 0x10, 0x31, 0xC6, 0x28, 0x2A, 0xEB, 0xCA, 0x94, 0x51, 0xD8,
0x61, 0x1B, 0xC6, 0x2A, 0x91, 0x71, 0xE4, 0x8C, 0xF8, 0x19, 0x2C, 0xDE,
0x71, 0xBB, 0xE3, 0xBD, 0x36, 0xB4, 0x45, 0x37, 0x02, 0x61, 0x48, 0x8E,
0x19, 0x80, 0xD5, 0x24, 0x97, 0x24, 0x92, 0x44, 0x08, 0x89, 0x12, 0x00,
0xB3, 0xF8, 0x1E, 0xE2, 0xBD, 0xCD, 0x4E, 0xF7, 0xA9, 0xE2, 0x0E, 0xD8,
0xEA, 0xFA, 0xCF, 0xDB, 0x4E, 0x69, 0x6F, 0xEE
};