How to sync input and playback for core audio using swift

Question

I have created an app which I am using to take acoustic measurements. The app generates a log sine sweep stimulus, and when the user presses 'start' the app simultaneously plays the stimulus sound, and records the microphone input.

All fairly standard stuff. I am using core audio as down the line I want to really delve into different functionality, and potentially use multiple interfaces, so have to start learning somewhere.

This is for iOS so I am creating an AUGraph with remoteIO Audio Unit for input and output. I have declared the audio formats, and they are correct as no errors are shown and the AUGraph initialises, starts, plays sound and records.

I have a render callback on the input scope to input 1 of my mixer. (ie, every time more audio is needed, the render callback is called and this reads a few samples into the buffer from my stimulus array of floats).

let genContext = Unmanaged.passRetained(self).toOpaque()
var genCallbackStruct = AURenderCallbackStruct(inputProc: genCallback,
                                                      inputProcRefCon: genContext)
    AudioUnitSetProperty(mixerUnit!, kAudioUnitProperty_SetRenderCallback,
                         kAudioUnitScope_Input, 1, &genCallbackStruct,
                         UInt32(MemoryLayout<AURenderCallbackStruct>.size))

I then have an input callback which is called every time the buffer is full on the output scope of the remoteIO input. This callback saves the samples to an array.

var inputCallbackStruct = AURenderCallbackStruct(inputProc: recordingCallback,
                                                      inputProcRefCon: context)
    AudioUnitSetProperty(remoteIOUnit!, kAudioOutputUnitProperty_SetInputCallback,
                                  kAudioUnitScope_Global, 0, &inputCallbackStruct,
                                  UInt32(MemoryLayout<AURenderCallbackStruct>.size))

Once the stimulus reaches the last sample, the AUGraph is stopped, and then I write both the stimulus and the recorded array to separate WAV files so I can check my data. What I am finding is that there is currently about 3000 samples delay between the recorded input and the stimulus.

Whilst it is hard to see the start of the waveforms (both the speakers and the microphone may not detect that low), the ends of the stimulus (bottom WAV) and the recorded should roughly line up.

There will be propagation time for the audio, I realise this, but at 44100Hz sample rate, that's 68ms. Core audio is meant to keep latency down.

So my question is this, can anybody account for this additional latency which seems quite high

my inputCallback is as follows:

let recordingCallback: AURenderCallback = { (
    inRefCon,
    ioActionFlags,
    inTimeStamp,
    inBusNumber,
    frameCount,
    ioData ) -> OSStatus in

    let audioObject = unsafeBitCast(inRefCon, to: AudioEngine.self)

    var err: OSStatus = noErr

    var bufferList = AudioBufferList(
        mNumberBuffers: 1,
        mBuffers: AudioBuffer(
            mNumberChannels: UInt32(1),
            mDataByteSize: 512,
            mData: nil))

    if let au: AudioUnit = audioObject.remoteIOUnit! {
        err = AudioUnitRender(au,
                              ioActionFlags,
                              inTimeStamp,
                              inBusNumber,
                              frameCount,
                              &bufferList)
    }

    let data = Data(bytes: bufferList.mBuffers.mData!, count: Int(bufferList.mBuffers.mDataByteSize))
    let samples = data.withUnsafeBytes {
        UnsafeBufferPointer<Int16>(start: $0, count: data.count / MemoryLayout<Int16>.size)
    }
    let factor = Float(Int16.max)
    var floats: [Float] = Array(repeating: 0.0, count: samples.count)
    for i in 0..<samples.count {
        floats[i] = (Float(samples[i]) /  factor)
    }

    var j = audioObject.in1BufIndex
    let m = audioObject.in1BufSize
    for i in 0..<(floats.count) {
        audioObject.in1Buf[j] = Float(floats[I])

    j += 1 ; if j >= m { j = 0 }   
    }
    audioObject.in1BufIndex = j
    audioObject.inputCallbackFrameSize = Int(frameCount)        
    audioObject.callbackcount += 1        
    var WindowSize = totalRecordSize / Int(frameCount)                  
    if audioObject.callbackcount == WindowSize {

        audioObject.running = false

    }

    return 0
}

So from when the engine starts, this callback should be called after the first set of data is collected from remoteIO. 512 samples as that is the default allocated buffer size. All it does is convert from the signed integer into Float, and save to a buffer. The value in1BufIndex is a reference to the last index in the array written to, and this is referenced and written to with each callback, to make sure the data in the array lines up.

Currently it seems about 3000 samples of silence is in the recorded array before the captured sweep is heard. Inspecting the recorded array by debugging in Xcode, all samples have values (and yes the first 3000 are very quiet), but somehow this doesn't add up.

Below is the generator Callback used to play my stimulus

let genCallback: AURenderCallback = { (
inRefCon,
ioActionFlags,
inTimeStamp,
inBusNumber,
frameCount,
ioData) -> OSStatus in

let audioObject = unsafeBitCast(inRefCon, to: AudioEngine.self)
for buffer in UnsafeMutableAudioBufferListPointer(ioData!) {
    var frames = buffer.mData!.assumingMemoryBound(to: Float.self)
    var j = 0
    if audioObject.stimulusReadIndex < (audioObject.Stimulus.count - Int(frameCount)){
        for i in stride(from: 0, to: Int(frameCount), by: 1) {

            frames[i] = Float((audioObject.Stimulus[j + audioObject.stimulusReadIndex]))

            j += 1

            audioObject.in2Buf[j + audioObject.stimulusReadIndex] = Float((audioObject.Stimulus[j + audioObject.stimulusReadIndex]))
        }

        audioObject.stimulusReadIndex += Int(frameCount)      
    }
}
return noErr;
}

I have edited the question to include this info. The bottom code is the generator callback that reads from my stimulus (Array of sample values). Since posting this question, I have inspected the timestamp of the first callback for the stimulus and for my mic input, and the timings are very different. Offsetting the arrays by this value results in the stimulus and recorded arrays lining up. So my issue it the callbacks being called at different times. Is there any way to get these calls closer together in time based on my code above? — samp17

hotpaw2 hotpaw2 · Accepted Answer · 2019-06-14T18:59:14

There may be at least 4 things contributing to the round trip latency.

512 samples, or 11 mS, is the time required to gather enough samples before remoteIO can call your callback.

Sound propagates at about 1 foot per millisecond, double that for a round trip.

The DAC has an output latency.

There is the time needed for the multiple ADCs (there’s more than 1 microphone on your iOS device) to sample and post-process the audio (for sigma-delta, beam forming, equalization, and etc.). The post processing might be done in blocks, thus incurring the latency to gather enough samples (an undocumented number) for one block.

There’s possibly also added overhead latency in moving data (hardware DMA of some unknown block size?) between the ADC and system memory, as well as driver and OS context switching overhead.

There’s also a startup latency to power up the audio hardware subsystems (amplifiers, etc.), so it may be best to start playing and recording audio well before outputting your sound (frequency sweep).

How to sync input and playback for core audio using swift

1 Answers