Handle Varying Number of Samples in Audio Unit Rendering Cycle

Question

This is a problem that's come up in my app after the introduction of the iPhone 6s and 6s+, and I'm almost positive that it is because the new model's built-in mic is stuck recording at 48kHz (you can read more about this here). To clarify, this was never a problem with previous phone models that I've tested. I'll walk through my Audio Engine implementation and the varying results at different points depending on the phone model further below.

So here's what's happening - when my code runs on previous devices I get a consistent number of audio samples in each CMSampleBuffer returned by the AVCaptureDevice, usually 1024 samples. The render callback for my audio unit graph provides an appropriate buffer with space for 1024 frames. Everything works great and sounds great.

Then Apple had to go make this damn iPhone 6s (just kidding, it's great, this bug is just getting to my head) and now I get some very inconsistent and confusing results. The AVCaptureDevice now varies between capturing 940 or 941 samples and the render callback now starts making a buffer with space for 940 or 941 sample frames on the first call, but then immediately starts increasing the space it reserves on subsequent calls up to 1010, 1012, or 1024 sample frames, then stays there. The space it ends up reserving varies by session. To be honest, I have no idea how this render callback is determining how many frames it prepares for the render, but I'm guessing it has to do with the sample rate of the Audio Unit that the render callback is on.

The format of the CMSampleBuffer comes in at 44.1kHz sample rate no matter what the device is, so I'm guessing theres some sort of implicit sample rate conversion that happens before I'm even receiving the CMSampleBuffer from the AVCaptureDevice on the 6s. The only difference is that the preferred hardware sample rate of the 6s is 48kHz opposed to earlier versions at 44.1kHz.

I've read that with the 6s you do have to be ready to make space for a varying number of samples being returned, but is the kind of behavior I described above normal? If it is, how can my render cycle be tailored to handle this?

Below is the code that is processing the audio buffers if you care to look further into this:

The audio samples buffers, which are CMSampleBufferRefs, come in through the mic AVCaptureDevice and are sent to my audio processing function that does the following to the captured CMSampleBufferRef named audioBuffer

CMBlockBufferRef buffer = CMSampleBufferGetDataBuffer(audioBuffer);
        
CMItemCount numSamplesInBuffer = CMSampleBufferGetNumSamples(audioBuffer);
AudioBufferList audioBufferList;
        
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(audioBuffer,
                                                            NULL,
                                                            &audioBufferList,
                                                            sizeof(audioBufferList),
                                                            NULL,
                                                            NULL,
                                                            kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
                                                            &buffer
                                                            );

self.audioProcessingCallback(&audioBufferList, numSamplesInBuffer, audioBuffer);
CFRelease(buffer);

This is putting the the audio samples into an AudioBufferList and sending it, along with the number of samples and the retained CMSampleBuffer, to the below function that I use for audio processing. TL;DR the following code sets up some Audio Units that are in an Audio Graph, using the CMSampleBuffer's format to set the ASBD for input, runs the audio samples through a converter unit, a newTimePitch unit, and then another converter unit. I then start a render call on the output converter unit with the number of samples that I received from the CMSampleBufferRef and put the rendered samples back into the AudioBufferList to subsequently be written out to the movie file, more on the Audio Unit Render Callback below.

movieWriter.audioProcessingCallback = {(audioBufferList, numSamplesInBuffer, CMSampleBuffer) -> () in

        var ASBDSize = UInt32(sizeof(AudioStreamBasicDescription))
        self.currentInputAudioBufferList = audioBufferList.memory
        
        let formatDescription = CMSampleBufferGetFormatDescription(CMSampleBuffer)
        let sampleBufferASBD = CMAudioFormatDescriptionGetStreamBasicDescription(formatDescription!)
        
        if (sampleBufferASBD.memory.mFormatID != kAudioFormatLinearPCM) {
            print("Bad ASBD")
        }
        
        if(sampleBufferASBD.memory.mChannelsPerFrame != self.currentInputASBD.mChannelsPerFrame || sampleBufferASBD.memory.mSampleRate != self.currentInputASBD.mSampleRate){
            
            
            // Set currentInputASBD to format of data coming IN from camera
            self.currentInputASBD = sampleBufferASBD.memory
            print("New IN ASBD: \(self.currentInputASBD)")
            
            // set the ASBD for converter in's input to currentInputASBD
            var err = AudioUnitSetProperty(self.converterInAudioUnit,
                kAudioUnitProperty_StreamFormat,
                kAudioUnitScope_Input,
                0,
                &self.currentInputASBD,
                UInt32(sizeof(AudioStreamBasicDescription)))
            self.checkErr(err, "Set converter in's input stream format")
            
            // Set currentOutputASBD to the in/out format for newTimePitch unit
            err = AudioUnitGetProperty(self.newTimePitchAudioUnit,
                kAudioUnitProperty_StreamFormat,
                kAudioUnitScope_Input,
                0,
                &self.currentOutputASBD,
                &ASBDSize)
            self.checkErr(err, "Get NewTimePitch ASBD stream format")
            
            print("New OUT ASBD: \(self.currentOutputASBD)")
            
            //Set the ASBD for the convert out's input to currentOutputASBD
            err = AudioUnitSetProperty(self.converterOutAudioUnit,
                kAudioUnitProperty_StreamFormat,
                kAudioUnitScope_Input,
                0,
                &self.currentOutputASBD,
                ASBDSize)
            self.checkErr(err, "Set converter out's input stream format")
            
            //Set the ASBD for the converter out's output to currentInputASBD
            err = AudioUnitSetProperty(self.converterOutAudioUnit,
                kAudioUnitProperty_StreamFormat,
                kAudioUnitScope_Output,
                0,
                &self.currentInputASBD,
                ASBDSize)
            self.checkErr(err, "Set converter out's output stream format")
            
            //Initialize the graph
            err = AUGraphInitialize(self.auGraph)
            self.checkErr(err, "Initialize audio graph")
            
            self.checkAllASBD()
            
        }
        
        self.currentSampleTime += Double(numSamplesInBuffer)
        
        var timeStamp = AudioTimeStamp()
        memset(&timeStamp, 0, sizeof(AudioTimeStamp))
        timeStamp.mSampleTime = self.currentSampleTime
        timeStamp.mFlags = AudioTimeStampFlags.SampleTimeValid
        
        var flags = AudioUnitRenderActionFlags(rawValue: 0)
        
        err = AudioUnitRender(self.converterOutAudioUnit,
            &flags,
            &timeStamp,
            0,
            UInt32(numSamplesInBuffer),
            audioBufferList)
        self.checkErr(err, "Render Call on converterOutAU")
        
    }

The Audio Unit Render Callback that is called once the AudioUnitRender call reaches the input converter unit is below

func pushCurrentInputBufferIntoAudioUnit(inRefCon : UnsafeMutablePointer<Void>, ioActionFlags : UnsafeMutablePointer<AudioUnitRenderActionFlags>, inTimeStamp : UnsafePointer<AudioTimeStamp>, inBusNumber : UInt32, inNumberFrames : UInt32, ioData : UnsafeMutablePointer<AudioBufferList>) -> OSStatus {

let bufferRef = UnsafeMutablePointer<AudioBufferList>(inRefCon)
ioData.memory = bufferRef.memory
print(inNumberFrames);

return noErr
}

Blah, this is a huge brain dump but I really appreciate ANY help. Please let me know if there's any additional information you need.

hotpaw2 hotpaw2 · Accepted Answer · 2015-12-02T03:45:14

Generally, you handle slight variations in buffer size (but a constant sample rate in and out) by putting the incoming samples in a lock-free circular fifo, and not removing any blocks of samples from that circular fifo until you have a full size block plus potentially some safety padding to cover future size jitter.

The variation in size probably has to do with the sample rate converter ratio not being a simple multiple, the resampling filter(s) needed, and any buffering needed for the resampling process.

1024 * (44100/48000) = 940.8

So that rate conversion might explain the jitter between 940 and 941 samples. If the hardware is always shipping out blocks of 1024 samples at a fixed rate of 48 kHz, and you need that block resampled to 44100 for your callback ASAP, there's a fraction of a converted sample that eventually needs to be output on only some output callbacks.

Handle Varying Number of Samples in Audio Unit Rendering Cycle

1 Answers