I have almost no knowledge in signal-processing and currently I'm trying to implement a function in Swift that triggers an event when there is an increase in the sound pressure level (e.g. when a human screams).
I am tapping into an input node of an AVAudioEngine with a callback like this:
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat){
(buffer : AVAudioPCMBuffer?, when : AVAudioTime) in
let arraySize = Int(buffer.frameLength)
let samples = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:arraySize))
//do something with samples
let volume = 20 * log10(floatArray.reduce(0){ $0 + $1} / Float(arraySize))
if(!volume.isNaN){
print("this is the current volume: \(volume)")
}
}
After turning it into a float array I tried just getting a rough estimation of the sound pressure level by computing the mean.
But this gives me values that fluctuate a lot even when the iPad was just sitting in a quite room:
this is the current volume: -123.971
this is the current volume: -119.698
this is the current volume: -147.053
this is the current volume: -119.749
this is the current volume: -118.815
this is the current volume: -123.26
this is the current volume: -118.953
this is the current volume: -117.273
this is the current volume: -116.869
this is the current volume: -110.633
this is the current volume: -130.988
this is the current volume: -119.475
this is the current volume: -116.422
this is the current volume: -158.268
this is the current volume: -118.933
There is indeed an significant increase in this value if I clap near the microphone.
So I can do something like first computing a mean of these volumes during the preparing phase, and comparing if there is a significant increase in the difference during the event-triggering phase:
if(!volume.isNaN){
if(isInThePreparingPhase){
print("this is the current volume: \(volume)")
volumeSum += volume
volumeCount += 1
}else if(isInTheEventTriggeringPhase){
if(volume > meanVolume){
//triggers an event
}
}
}
where averageVolume is computed during the transition from the preparing phase to the triggering event phase: meanVolume = volumeSum / Float(volumeCount)
....
However, there appears to be no significant increases if I play loud music besides the microphone. And on rare occasion, volume
is greater than meanVolume
even when the environment has no significant increase in volume (audible to the human ears).
So what is the proper way of extracting the sound pressure level from AVAudioPCMBuffer?
The wikipedia gives a formula like this
with p being the root mean square sound pressure and p0 being the reference sound pressure.
But I have no ideas what the float values in AVAudioPCMBuffer.floatChannelData
represent. The apple page only says
The buffer's audio samples as floating point values.
How should I work with them?
floatArray
? here ...let volume = 20 * log10(floatArray.reduce(0){ $0 + $1} / Float(arraySize)) ....
– MikeMaus