This is a problem that's come up in my app after the introduction of the iPhone 6s and 6s+, and I'm almost positive that it is because the new model's built-in mic is stuck recording at 48kHz (you can read more about this here). To clarify, this was never a problem with previous phone models that I've tested. I'll walk through my Audio Engine implementation and the varying results at different points depending on the phone model further below.
So here's what's happening - when my code runs on previous devices I get a consistent number of audio samples in each CMSampleBuffer returned by the AVCaptureDevice, usually 1024 samples. The render callback for my audio unit graph provides an appropriate buffer with space for 1024 frames. Everything works great and sounds great.
Then Apple had to go make this damn iPhone 6s (just kidding, it's great, this bug is just getting to my head) and now I get some very inconsistent and confusing results. The AVCaptureDevice now varies between capturing 940 or 941 samples and the render callback now starts making a buffer with space for 940 or 941 sample frames on the first call, but then immediately starts increasing the space it reserves on subsequent calls up to 1010, 1012, or 1024 sample frames, then stays there. The space it ends up reserving varies by session. To be honest, I have no idea how this render callback is determining how many frames it prepares for the render, but I'm guessing it has to do with the sample rate of the Audio Unit that the render callback is on.
The format of the CMSampleBuffer comes in at 44.1kHz sample rate no matter what the device is, so I'm guessing theres some sort of implicit sample rate conversion that happens before I'm even receiving the CMSampleBuffer from the AVCaptureDevice on the 6s. The only difference is that the preferred hardware sample rate of the 6s is 48kHz opposed to earlier versions at 44.1kHz.
I've read that with the 6s you do have to be ready to make space for a varying number of samples being returned, but is the kind of behavior I described above normal? If it is, how can my render cycle be tailored to handle this?
Below is the code that is processing the audio buffers if you care to look further into this:
The audio samples buffers, which are CMSampleBufferRefs, come in through the mic AVCaptureDevice and are sent to my audio processing function that does the following to the captured CMSampleBufferRef named audioBuffer
CMBlockBufferRef buffer = CMSampleBufferGetDataBuffer(audioBuffer);
CMItemCount numSamplesInBuffer = CMSampleBufferGetNumSamples(audioBuffer);
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(audioBuffer,
NULL,
&audioBufferList,
sizeof(audioBufferList),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&buffer
);
self.audioProcessingCallback(&audioBufferList, numSamplesInBuffer, audioBuffer);
CFRelease(buffer);
This is putting the the audio samples into an AudioBufferList and sending it, along with the number of samples and the retained CMSampleBuffer, to the below function that I use for audio processing. TL;DR the following code sets up some Audio Units that are in an Audio Graph, using the CMSampleBuffer's format to set the ASBD for input, runs the audio samples through a converter unit, a newTimePitch unit, and then another converter unit. I then start a render call on the output converter unit with the number of samples that I received from the CMSampleBufferRef and put the rendered samples back into the AudioBufferList to subsequently be written out to the movie file, more on the Audio Unit Render Callback below.
movieWriter.audioProcessingCallback = {(audioBufferList, numSamplesInBuffer, CMSampleBuffer) -> () in
var ASBDSize = UInt32(sizeof(AudioStreamBasicDescription))
self.currentInputAudioBufferList = audioBufferList.memory
let formatDescription = CMSampleBufferGetFormatDescription(CMSampleBuffer)
let sampleBufferASBD = CMAudioFormatDescriptionGetStreamBasicDescription(formatDescription!)
if (sampleBufferASBD.memory.mFormatID != kAudioFormatLinearPCM) {
print("Bad ASBD")
}
if(sampleBufferASBD.memory.mChannelsPerFrame != self.currentInputASBD.mChannelsPerFrame || sampleBufferASBD.memory.mSampleRate != self.currentInputASBD.mSampleRate){
// Set currentInputASBD to format of data coming IN from camera
self.currentInputASBD = sampleBufferASBD.memory
print("New IN ASBD: \(self.currentInputASBD)")
// set the ASBD for converter in's input to currentInputASBD
var err = AudioUnitSetProperty(self.converterInAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&self.currentInputASBD,
UInt32(sizeof(AudioStreamBasicDescription)))
self.checkErr(err, "Set converter in's input stream format")
// Set currentOutputASBD to the in/out format for newTimePitch unit
err = AudioUnitGetProperty(self.newTimePitchAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&self.currentOutputASBD,
&ASBDSize)
self.checkErr(err, "Get NewTimePitch ASBD stream format")
print("New OUT ASBD: \(self.currentOutputASBD)")
//Set the ASBD for the convert out's input to currentOutputASBD
err = AudioUnitSetProperty(self.converterOutAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&self.currentOutputASBD,
ASBDSize)
self.checkErr(err, "Set converter out's input stream format")
//Set the ASBD for the converter out's output to currentInputASBD
err = AudioUnitSetProperty(self.converterOutAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
0,
&self.currentInputASBD,
ASBDSize)
self.checkErr(err, "Set converter out's output stream format")
//Initialize the graph
err = AUGraphInitialize(self.auGraph)
self.checkErr(err, "Initialize audio graph")
self.checkAllASBD()
}
self.currentSampleTime += Double(numSamplesInBuffer)
var timeStamp = AudioTimeStamp()
memset(&timeStamp, 0, sizeof(AudioTimeStamp))
timeStamp.mSampleTime = self.currentSampleTime
timeStamp.mFlags = AudioTimeStampFlags.SampleTimeValid
var flags = AudioUnitRenderActionFlags(rawValue: 0)
err = AudioUnitRender(self.converterOutAudioUnit,
&flags,
&timeStamp,
0,
UInt32(numSamplesInBuffer),
audioBufferList)
self.checkErr(err, "Render Call on converterOutAU")
}
The Audio Unit Render Callback that is called once the AudioUnitRender call reaches the input converter unit is below
func pushCurrentInputBufferIntoAudioUnit(inRefCon : UnsafeMutablePointer<Void>, ioActionFlags : UnsafeMutablePointer<AudioUnitRenderActionFlags>, inTimeStamp : UnsafePointer<AudioTimeStamp>, inBusNumber : UInt32, inNumberFrames : UInt32, ioData : UnsafeMutablePointer<AudioBufferList>) -> OSStatus {
let bufferRef = UnsafeMutablePointer<AudioBufferList>(inRefCon)
ioData.memory = bufferRef.memory
print(inNumberFrames);
return noErr
}
Blah, this is a huge brain dump but I really appreciate ANY help. Please let me know if there's any additional information you need.