I'm trying to decode an AAC audio stream in an ADTS container, which is streamed from an external hardware H264 encoder.
I've parsed out the ADTS and it tells me I've got a 2 channel, 44100 AAC Main profile frame. I setup the extra data bytes for the ffmpeg decoder and decode the frame successfully? as follows:
(pseudo c++ code)
setup the decoder:
avcodec_find_decoder(codec_id);
avcodec_alloc_context3(context->codec);
avcodec_open2(context->av_codec_context, context->codec, nullptr);
av_init_packet(&context->av_raw_packet);
setup the extra data bytes:
// AOT_MAIN, 44.1kHz, Stereo
// 00001010 00010000
// extradata = 0x0A, 0X10
memcpy(context->av_codec_context->extradata, extradata, extradataLength);
avcodec_open2(context->av_codec_context, context->codec, nullptr);
then decode the frame:
// decode frame
const int len = avcodec_decode_audio4(context->av_codec_context, context->frame, &got_frame, &context->av_raw_packet);
*sampleRate = context->av_codec_context->sample_rate;
*sampleFormat = context->av_codec_context->sample_format;
*bitsPerSample = av_get_bytes_per_sample(context->av_codec_context->sample_fmt) * 8;
*channels = context->av_codec_context->channels;
*channelLayout = context->av_codec_context->channelLayout;
// get frame
*outDataSize = av_samples_get_buffer_size(nullptr, context->av_codec_context->channels, context->frame->nb_samples, context->av_codec_context->sample_fmt, 1);
The decoded frame:
// array of 8192 bytes, context info is as expected:
context->av_codec_context->channels = 2
context->av_codec_context->channelLayout = 3 (AV_CH_LAYOUT_STEREO)
context->frame->sample_fmt = 8 (AV_SAMPLE_FMT_FLTP) // float, planar
context->frame->sample_rate = 44100
Now as I understand it each frame in the raw format for 32 bit will be 4 bytes per sample, and each channel will be interleaved (so every 4th byte is the alternating channel). That leaves me with 1024 samples for each channel (8192 / 32 bits / 2 channels).
I've tried exporting multiple frames of this data to a file, and importing as a raw file (32-bit float, 2 channel 44100Hz, little endian) in Audacity to sanity check. Instead of music, all I get is noise and the detected length of the audio is way longer than I would have expected (5 seconds dumped to file, but Audacity says 22.5 seconds). I've tried a variety of import format settings. What am I likely doing wrong here?
I'm a little new to working with audio, so I may be misunderstanding something.
Edit: I tried panning the audio to the right channel, and its reflected in the data. It also looks like a repeating pattern exactly 1024 samples apart, which indicates to me a programming error with a buffer not getting overwritten after the first sample.