8
votes

I have almost no knowledge in signal-processing and currently I'm trying to implement a function in Swift that triggers an event when there is an increase in the sound pressure level (e.g. when a human screams).

I am tapping into an input node of an AVAudioEngine with a callback like this:

let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat){
 (buffer : AVAudioPCMBuffer?, when : AVAudioTime) in 
    let arraySize = Int(buffer.frameLength)
    let samples = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:arraySize))

   //do something with samples
    let volume = 20 * log10(floatArray.reduce(0){ $0 + $1} / Float(arraySize))
    if(!volume.isNaN){
       print("this is the current volume: \(volume)")
    }
}

After turning it into a float array I tried just getting a rough estimation of the sound pressure level by computing the mean.

But this gives me values that fluctuate a lot even when the iPad was just sitting in a quite room:

this is the current volume: -123.971
this is the current volume: -119.698
this is the current volume: -147.053
this is the current volume: -119.749
this is the current volume: -118.815
this is the current volume: -123.26
this is the current volume: -118.953
this is the current volume: -117.273
this is the current volume: -116.869
this is the current volume: -110.633
this is the current volume: -130.988
this is the current volume: -119.475
this is the current volume: -116.422
this is the current volume: -158.268
this is the current volume: -118.933

There is indeed an significant increase in this value if I clap near the microphone.

So I can do something like first computing a mean of these volumes during the preparing phase, and comparing if there is a significant increase in the difference during the event-triggering phase:

 if(!volume.isNaN){
    if(isInThePreparingPhase){
        print("this is the current volume: \(volume)")
        volumeSum += volume
        volumeCount += 1
     }else if(isInTheEventTriggeringPhase){
         if(volume > meanVolume){
             //triggers an event
         }
      }
 }

where averageVolume is computed during the transition from the preparing phase to the triggering event phase: meanVolume = volumeSum / Float(volumeCount)

....

However, there appears to be no significant increases if I play loud music besides the microphone. And on rare occasion, volume is greater than meanVolume even when the environment has no significant increase in volume (audible to the human ears).

So what is the proper way of extracting the sound pressure level from AVAudioPCMBuffer?

The wikipedia gives a formula like this

math!

with p being the root mean square sound pressure and p0 being the reference sound pressure.

But I have no ideas what the float values in AVAudioPCMBuffer.floatChannelData represent. The apple page only says

The buffer's audio samples as floating point values.

How should I work with them?

2
Hi arch, I imagine you figured out an answer to this question? do you have any code that you could provide?Logan
What is floatArray ? here ... let volume = 20 * log10(floatArray.reduce(0){ $0 + $1} / Float(arraySize)) ....MikeMaus

2 Answers

6
votes

Thanks to the response from @teadrinker I finally find out a solution for this problem. I share my Swift code that outputs the volume of the AVAudioPCMBuffer input:

private func getVolume(from buffer: AVAudioPCMBuffer, bufferSize: Int) -> Float {
    guard let channelData = buffer.floatChannelData?[0] else {
        return 0
    }

    let channelDataArray = Array(UnsafeBufferPointer(start:channelData, count: bufferSize))

    var outEnvelope = [Float]()
    var envelopeState:Float = 0
    let envConstantAtk:Float = 0.16
    let envConstantDec:Float = 0.003

    for sample in channelDataArray {
        let rectified = abs(sample)

        if envelopeState < rectified {
            envelopeState += envConstantAtk * (rectified - envelopeState)
        } else {
            envelopeState += envConstantDec * (rectified - envelopeState)
        }
        outEnvelope.append(envelopeState)
    }

    // 0.007 is the low pass filter to prevent
    // getting the noise entering from the microphone
    if let maxVolume = outEnvelope.max(),
        maxVolume > Float(0.015) {
        return maxVolume
    } else {
        return 0.0
    }
}
5
votes

I think the first step is to get the envelope of the sound. You could use simple averaging to calculate an envelope, but you need to add a rectification step (usually means using abs() or square() to make all samples positive)

More commonly a simple iir-filter is used instead of averaging, with different constants for attack and decay, here is a lab. Note that these constants depend on the sampling frequency, you can use this formula to calculate the constants:

1 - exp(-timePerSample*2/smoothingTime)

Step 2

When you have the envelope, you can smooth it with an additional filter, and then compare the two envelopes to find a sound that is louder than the baselevel, here's a more complete lab.

Note that detecting audio "events" can be quite tricky, and hard to predict, make sure you have a lot of debbugging aid!