I have some automated tests that try to decode a few m4a files to PCM data using Android's MediaDecoder
and MediaExtractor
. The files are generated with various encoders: fdk-aac, ffmpeg (with fdk or the default aac encoder), iOS.
On Android 9 the test fails for the clips created with ffmpeg
, which results in empty PCM files. The same clips are decoded fine on older versions of Android.
I double checked my code and the decoding process goes as expected:
- I extract compressed data using
MediaExtractor
- Enqueue it to the codec
- Dequeue the output buffer from the codec.
The issue is that by the time the last available input buffer is enqueued and the output buffer with MediaCodec.BUFFER_FLAG_END_OF_STREAM
is dequeued, all output buffers are empty!
Then I noticed that the MediaFormat
info extracted from the audio file with MediaExtractor.getTrackFormat(int track)
contains an undocumented "encoder-delay"
key.
For android 8 and lower, that key is only present for m4a clips encoded with the iTunSMPB
tag info. Here's a summary of the values I get for my test files:
iOS-encoded file: 2112 frames
fdkaac with iTunSMPB tag: 2048 frames
fdkaac with ISO delay info: key not present
ffmpeg: key not present
ffmpeg (fdk): key not present
On Android 9, instead, I get the following results:
iOS-encoded file: 2112 frames
fdkaac with iTunSMPB tag: 2048 frames
fdkaac with ISO delay info: 2048 frames
ffmpeg: 45158 frames
ffmpeg (fdk): 90317 frames
It looks like something has changed and MediaExtractor
is now able to retrieve the encoder delay for all the files under test. This is good in theory, since the files with no "encoder-delay"
info do show a delay in the decoded PCM data (this was a known issue).
But... while the value for the "fdkaac with ISO delay info" case is correct and leads to a valid PCM file with no initial padding (finally!), the values for the ffmpeg-generated files look huge and likely wrong!
I know the real encoder delay values are 1024 for the ffmpeg case, and 2048 for the ffmpeg (fdk) case, and I think the high value for key in the extracted format is the reason why the file is empty.
In fact, if I try setting the "encoder-delay"
key to 0 in the format just before passing it to MediaCodec.configure(...)
I get the correct uncompressed data with the expected delay.
My guess at this point is that the MediaExtractor
encoder delay value retrieval has some bug, but maybe there's something I am overlooking.
Since ffmpeg is quite popular, it's quite likely that many of my app users will try importing files generated with it, and at this point I can't see a foolproof solution to the issue.
Does anyone have a suggestion / workaround?