need help configuring ffmpeg to decode raw AAC with android ndk

Question

I've got an android app that gets raw AAC bytes from an external device and I want to decode that data but I can't seem to get the decoder to work, yet ffmpeg seems to work fine for decoding an mp4 file that contains the same audio data (verified with isoviewer). Recently I was able to get this ffmpeg library on android to decode video frames from the same external device but audio won't seem to work.

Here is the ffmpeg output for the file with the same data:

$ ffmpeg -i Video_2000-01-01_0411.mp4
ffmpeg version 2.6.1 Copyright (c) 2000-2015 the FFmpeg developers
  built with Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/2.6.1 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-libx264 --enable-libmp3lame --enable-libvo-aacenc --enable-libxvid --enable-vda
  libavutil      54. 20.100 / 54. 20.100
  libavcodec     56. 26.100 / 56. 26.100
  libavformat    56. 25.101 / 56. 25.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 11.102 /  5. 11.102
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'AXON_Flex_Video_2000-01-01_0411.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isom3gp43gp5
  Duration: 00:00:15.73, start: 0.000000, bitrate: 1134 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, mono, fltp, 40 kb/s (default)
    Metadata:
      handler_name    : soun
    Stream #0:1(eng): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 640x480 [SAR 1:1 DAR 4:3], 1087 kb/s, 29.32 fps, 26.58 tbr, 90k tbn, 1k tbc (default)
    Metadata:
      handler_name    : vide

Here is my ndk code for setting up and decoding the audio:

jint ffmpeg_init(JNIEnv * env, jobject this) {
    audioCodec = avcodec_find_decoder(AV_CODEC_ID_AAC);
    if (!audioCodec) {
        LOGE("audio codec %d not found", AV_CODEC_ID_AAC);
        return -1;
    }

    audioContext = avcodec_alloc_context3(audioCodec);
    if (!audioContext) {
        LOGE("Could not allocate codec context");
        return -1;
    }

     int openRet = avcodec_open2(audioContext, audioCodec, NULL);
        if (openRet < 0) {
          LOGE("Could not open codec, error:%d", openRet);
          return -1;
        }

    audioContext->sample_rate = 8000;
    audioContext->channel_layout = AV_CH_LAYOUT_MONO;
    audioContext->profile = FF_PROFILE_AAC_LOW;
    audioContext->bit_rate = 48 * 1024;
    audioContext->sample_fmt = AV_SAMPLE_FMT_FLTP;

  //  unsigned char extradata[] = {0x15, 0x88}; 
  //  audioContext->extradata = extradata;
  //  audioContext->extradata_size = sizeof(extradata);
    audioFrame = av_frame_alloc();
    if (!audioFrame) {
        LOGE("Could not create audio frame");
        return -1;
    }
}


jint ffmpeg_decodeAudio(JNIEnv *env, jobject this, jbyteArray aacData, jbyteArray output, int offset, int len) {

    LOGI("ffmpeg_decodeAudio()");
    char errbuf[128];
    AVPacket avpkt = {0};
    av_init_packet(&avpkt);
    LOGI("av_init_packet()");
    int error, got_frame;    
    uint8_t* buffer = (uint8_t *) (*env)->GetByteArrayElements(env, aacData,0);
    uint8_t* copy = av_malloc(len);  
    memcpy(copy, &buffer[offset], len);
    av_packet_from_data(&avpkt, copy, len);


    if ((error = avcodec_decode_audio4(audioContext, audioFrame, &got_frame, &avpkt)) < 0) {
        ffmpeg_log_error(error);
        av_free_packet(&avpkt);
        return error;
    }
    if (got_frame) {
        LOGE("Copying audioFrame->extended_data to output jbytearray, linesize[0]:%d", audioFrame->linesize[0]);
        (*env)->SetByteArrayRegion(env, output, 0, audioFrame->linesize[0],  *audioFrame->extended_data);
    }

    return 0;

}

As you can see I've got an init function that opens the decoder and creates the context, these things all work fine, without error. However when I call avcodec_decode_audio4 I get an error :

FFMPEG error: -1094995529, Invalid data found when processing input

I've tried all sorts of combinations of AVCodecContext properties. I'm not sure which I need to set for the decoder to do it's job but from reading online I should just need to set the channel layout and the sample_rate (which I've tried by themself). I've also tried setting the extradata/extradata_size parameters to that which should match the video settings per: http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio But no luck.

Since the device we're getting packets from sends aac data that have no sound at the beginning (but are valid packets), I've tried to just send those since they definitely should decode correctly.

Here is an example of the initial audio packets that are of silence:

 010c9eb43f21f90fc87e46fff10a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5dffe214b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4bbd1c429696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696978

Note that the data shown above is just a hex encoding of the data that I'm putting in AVPacket, and it was sent from an external device to the android application. My application doesn't have direct access to the file though so I need to decode the raw frames/samples as I get them. When I look at the audio track data in isoviewer I can see that the audio track's first sample is the same data as what I got from the device that contained that file (thus, the external device is just sending me the sample's raw data). I believe this data can be derived from reading stsz (sample size) box starting at stco (chunk offset) boxes from the mdat box of the file.

Also, isoviewer shows the esds box as having the following:

ESDescriptor{esId=0, streamDependenceFlag=0, URLFlag=0, oCRstreamFlag=0, streamPriority=0, URLLength=0, URLString='null', remoteODFlag=0, dependsOnEsId=0, oCREsId=0, decoderConfigDescriptor=DecoderConfigDescriptor{objectTypeIndication=64, streamType=5, upStream=0, bufferSizeDB=513, maxBitRate=32000, avgBitRate=32000, decoderSpecificInfo=null, audioSpecificInfo=AudioSpecificConfig{configBytes=1588, audioObjectType=2 (AAC LC), samplingFrequencyIndex=11 (8000), samplingFrequency=0, channelConfiguration=1, syncExtensionType=0, frameLengthFlag=0, dependsOnCoreCoder=0, coreCoderDelay=0, extensionFlag=0, layerNr=0, numOfSubFrame=0, layer_length=0, aacSectionDataResilienceFlag=false, aacScalefactorDataResilienceFlag=false, aacSpectralDataResilienceFlag=false, extensionFlag3=0}, configDescriptorDeadBytes=, profileLevelIndicationDescriptors=[[]]}, slConfigDescriptor=SLConfigDescriptor{predefined=2}}

And the binary is this:

00 00 00 30 65 73 64 73 00 00 00 00 03 80 80 80
1f 00 00 00 04 80 80 80 14 40 15 00 02 01 00 00
7d 00 00 00 7d 00 05 80 80 80 02 15 88 06 01 02

What type of data do you input to these functions? 010c... doesn't really help, is it packetized AAC data from a parsed m4a file? Or raw AAC file chunks? Or m4a file chunks? Or something else? — Ronald S. Bultje
I've updated (last paragraph) to explain what the data represents. — Matt Wolfe
@RonaldS.Bultje I just saw your comment from here stackoverflow.com/questions/31726738/ffmpeg-native-aac-decoder/… and that makes me wonder if I can just take the esds box data and set that to the extradata field. Do I set the entire box data or just certain parts? I should be able to just hardcode that data. — Matt Wolfe
See ffmpeg.org/doxygen/trunk/mov_8c_source.html#l00652 and ffmpeg.org/doxygen/trunk/isom_8c_source.html#l00451 for the expected layout of esds extradata. — Ronald S. Bultje

Matt Wolfe Matt Wolfe · Accepted Answer · 2015-08-20T02:04:30

I found the main problem with the above code. The decoder gets initialized when you call avcodec_open2. Thus I should set the context fields first before opening like this:

jint ffmpeg_init(JNIEnv * env, jobject this) {
//....

audioContext = avcodec_alloc_context3(audioCodec);

audioContext->sample_rate = 8000;
audioContext->channel_layout = AV_CH_LAYOUT_MONO;
audioContext->channels = 1;
int openRet = avcodec_open2(audioContext, audioCodec, NULL);
if (openRet < 0) {
   LOGE("Could not open codec, error:%d", openRet);
   return -1;
}

The decoder is now decoding the audio without error.

need help configuring ffmpeg to decode raw AAC with android ndk

1 Answers