I've got an android app that gets raw AAC bytes from an external device and I want to decode that data but I can't seem to get the decoder to work, yet ffmpeg seems to work fine for decoding an mp4 file that contains the same audio data (verified with isoviewer). Recently I was able to get this ffmpeg library on android to decode video frames from the same external device but audio won't seem to work.
Here is the ffmpeg output for the file with the same data:
$ ffmpeg -i Video_2000-01-01_0411.mp4
ffmpeg version 2.6.1 Copyright (c) 2000-2015 the FFmpeg developers
built with Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
configuration: --prefix=/usr/local/Cellar/ffmpeg/2.6.1 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-libx264 --enable-libmp3lame --enable-libvo-aacenc --enable-libxvid --enable-vda
libavutil 54. 20.100 / 54. 20.100
libavcodec 56. 26.100 / 56. 26.100
libavformat 56. 25.101 / 56. 25.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 11.102 / 5. 11.102
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'AXON_Flex_Video_2000-01-01_0411.mp4':
Metadata:
major_brand : mp42
minor_version : 1
compatible_brands: isom3gp43gp5
Duration: 00:00:15.73, start: 0.000000, bitrate: 1134 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, mono, fltp, 40 kb/s (default)
Metadata:
handler_name : soun
Stream #0:1(eng): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 640x480 [SAR 1:1 DAR 4:3], 1087 kb/s, 29.32 fps, 26.58 tbr, 90k tbn, 1k tbc (default)
Metadata:
handler_name : vide
Here is my ndk code for setting up and decoding the audio:
jint ffmpeg_init(JNIEnv * env, jobject this) {
audioCodec = avcodec_find_decoder(AV_CODEC_ID_AAC);
if (!audioCodec) {
LOGE("audio codec %d not found", AV_CODEC_ID_AAC);
return -1;
}
audioContext = avcodec_alloc_context3(audioCodec);
if (!audioContext) {
LOGE("Could not allocate codec context");
return -1;
}
int openRet = avcodec_open2(audioContext, audioCodec, NULL);
if (openRet < 0) {
LOGE("Could not open codec, error:%d", openRet);
return -1;
}
audioContext->sample_rate = 8000;
audioContext->channel_layout = AV_CH_LAYOUT_MONO;
audioContext->profile = FF_PROFILE_AAC_LOW;
audioContext->bit_rate = 48 * 1024;
audioContext->sample_fmt = AV_SAMPLE_FMT_FLTP;
// unsigned char extradata[] = {0x15, 0x88};
// audioContext->extradata = extradata;
// audioContext->extradata_size = sizeof(extradata);
audioFrame = av_frame_alloc();
if (!audioFrame) {
LOGE("Could not create audio frame");
return -1;
}
}
jint ffmpeg_decodeAudio(JNIEnv *env, jobject this, jbyteArray aacData, jbyteArray output, int offset, int len) {
LOGI("ffmpeg_decodeAudio()");
char errbuf[128];
AVPacket avpkt = {0};
av_init_packet(&avpkt);
LOGI("av_init_packet()");
int error, got_frame;
uint8_t* buffer = (uint8_t *) (*env)->GetByteArrayElements(env, aacData,0);
uint8_t* copy = av_malloc(len);
memcpy(copy, &buffer[offset], len);
av_packet_from_data(&avpkt, copy, len);
if ((error = avcodec_decode_audio4(audioContext, audioFrame, &got_frame, &avpkt)) < 0) {
ffmpeg_log_error(error);
av_free_packet(&avpkt);
return error;
}
if (got_frame) {
LOGE("Copying audioFrame->extended_data to output jbytearray, linesize[0]:%d", audioFrame->linesize[0]);
(*env)->SetByteArrayRegion(env, output, 0, audioFrame->linesize[0], *audioFrame->extended_data);
}
return 0;
}
As you can see I've got an init function that opens the decoder and creates the context, these things all work fine, without error. However when I call avcodec_decode_audio4 I get an error :
FFMPEG error: -1094995529, Invalid data found when processing input
I've tried all sorts of combinations of AVCodecContext properties. I'm not sure which I need to set for the decoder to do it's job but from reading online I should just need to set the channel layout and the sample_rate (which I've tried by themself). I've also tried setting the extradata/extradata_size parameters to that which should match the video settings per: http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio But no luck.
Since the device we're getting packets from sends aac data that have no sound at the beginning (but are valid packets), I've tried to just send those since they definitely should decode correctly.
Here is an example of the initial audio packets that are of silence:
010c9eb43f21f90fc87e46fff10a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5dffe214b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4b4bbd1c429696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696969696978
Note that the data shown above is just a hex encoding of the data that I'm putting in AVPacket, and it was sent from an external device to the android application. My application doesn't have direct access to the file though so I need to decode the raw frames/samples as I get them. When I look at the audio track data in isoviewer I can see that the audio track's first sample is the same data as what I got from the device that contained that file (thus, the external device is just sending me the sample's raw data). I believe this data can be derived from reading stsz (sample size) box starting at stco (chunk offset) boxes from the mdat box of the file.
Also, isoviewer shows the esds box as having the following:
ESDescriptor{esId=0, streamDependenceFlag=0, URLFlag=0, oCRstreamFlag=0, streamPriority=0, URLLength=0, URLString='null', remoteODFlag=0, dependsOnEsId=0, oCREsId=0, decoderConfigDescriptor=DecoderConfigDescriptor{objectTypeIndication=64, streamType=5, upStream=0, bufferSizeDB=513, maxBitRate=32000, avgBitRate=32000, decoderSpecificInfo=null, audioSpecificInfo=AudioSpecificConfig{configBytes=1588, audioObjectType=2 (AAC LC), samplingFrequencyIndex=11 (8000), samplingFrequency=0, channelConfiguration=1, syncExtensionType=0, frameLengthFlag=0, dependsOnCoreCoder=0, coreCoderDelay=0, extensionFlag=0, layerNr=0, numOfSubFrame=0, layer_length=0, aacSectionDataResilienceFlag=false, aacScalefactorDataResilienceFlag=false, aacSpectralDataResilienceFlag=false, extensionFlag3=0}, configDescriptorDeadBytes=, profileLevelIndicationDescriptors=[[]]}, slConfigDescriptor=SLConfigDescriptor{predefined=2}}
And the binary is this:
00 00 00 30 65 73 64 73 00 00 00 00 03 80 80 80
1f 00 00 00 04 80 80 80 14 40 15 00 02 01 00 00
7d 00 00 00 7d 00 05 80 80 80 02 15 88 06 01 02