2
votes

I'm trying to decode an AAC audio stream in an ADTS container, which is streamed from an external hardware H264 encoder.

I've parsed out the ADTS and it tells me I've got a 2 channel, 44100 AAC Main profile frame. I setup the extra data bytes for the ffmpeg decoder and decode the frame successfully? as follows:

(pseudo c++ code)

setup the decoder:

avcodec_find_decoder(codec_id);
avcodec_alloc_context3(context->codec);
avcodec_open2(context->av_codec_context, context->codec, nullptr);
av_init_packet(&context->av_raw_packet);

setup the extra data bytes:

// AOT_MAIN, 44.1kHz, Stereo
// 00001010 00010000
// extradata = 0x0A, 0X10
memcpy(context->av_codec_context->extradata, extradata, extradataLength);
avcodec_open2(context->av_codec_context, context->codec, nullptr);

then decode the frame:

// decode frame
const int len = avcodec_decode_audio4(context->av_codec_context, context->frame, &got_frame, &context->av_raw_packet);
*sampleRate = context->av_codec_context->sample_rate;
*sampleFormat = context->av_codec_context->sample_format;
*bitsPerSample = av_get_bytes_per_sample(context->av_codec_context->sample_fmt) * 8;
*channels = context->av_codec_context->channels;
*channelLayout = context->av_codec_context->channelLayout;
// get frame
*outDataSize = av_samples_get_buffer_size(nullptr, context->av_codec_context->channels, context->frame->nb_samples, context->av_codec_context->sample_fmt, 1);

The decoded frame:

// array of 8192 bytes, context info is as expected:
context->av_codec_context->channels = 2
context->av_codec_context->channelLayout = 3 (AV_CH_LAYOUT_STEREO)
context->frame->sample_fmt = 8 (AV_SAMPLE_FMT_FLTP) // float, planar
context->frame->sample_rate = 44100

Now as I understand it each frame in the raw format for 32 bit will be 4 bytes per sample, and each channel will be interleaved (so every 4th byte is the alternating channel). That leaves me with 1024 samples for each channel (8192 / 32 bits / 2 channels).

I've tried exporting multiple frames of this data to a file, and importing as a raw file (32-bit float, 2 channel 44100Hz, little endian) in Audacity to sanity check. Instead of music, all I get is noise and the detected length of the audio is way longer than I would have expected (5 seconds dumped to file, but Audacity says 22.5 seconds). I've tried a variety of import format settings. What am I likely doing wrong here?

I'm a little new to working with audio, so I may be misunderstanding something.

Edit: I tried panning the audio to the right channel, and its reflected in the data. It also looks like a repeating pattern exactly 1024 samples apart, which indicates to me a programming error with a buffer not getting overwritten after the first sample. 12 frames

1
I almost wonder if somehow my Adts payload for the frame is somehow incorrect, even though ffmpeg isn’t complaining on decode. I’m fairly confident I understand the raw data formats well enough to know it should be workingMichael Brown
Tried panning the input either left or right, and the "noise" shows up in the correct channel with the other channel being empty. So it's clear a signal is driving this but the pattern of bytes seem to repeat itself on the waveform analysis - almost like the first sample is just repeating over and over generating a noisy tone.Michael Brown

1 Answers

2
votes

This was nothing more than a difficult bug to find. Zooming in on the audio sample in Audacity revealed the repeating pattern of 1024 samples wide.

A buffer was in fact not being updated properly and I was processing the same audio frame over and over:

for(var offset = 0; offset < packet.Length; offset++) {
  var frame = ReadAdtsFrame();
  // offset += frame.Length; 
  // ^ essentially this was missing, so the frame buffer was always the first frame
}

I will leave this here to display my shame to the world and a reminder that most often its your own bugs that get you in the end.