0
votes

I'm developing Flutter plugin which is targeting only Android for now. It's kind of synthesis thing; Users can load audio file into memory, and they can adjust pitch (not pitch shift) and play multiple sound with the least delay using audio library called Oboe.

I managed to get PCM data from audio files which MediaCodec class supports, and also succeeded to handle pitch by manipulating playback via accessing PCM array manually too.

This PCM array is stored as float array, ranging from -1.0 to 1.0. I now want to support panning feature, just like what internal Android class such as SoundPool. I'm planning to follow how SoundPool is handling panning. There are 2 values I have to pass to SoundPool when performing panning effect : left, and right. These 2 values are float, and must range from 0.0 to 1.0.

For example, if I pass (1.0F, 0.0F), then users can hear sound only by left ear. (1.0F, 1.0F) will be normal (center). Panning wasn't problem... until I encountered handling stereo sounds. I know what to do to perform panning with stereo PCM data, but I don't know how to perform natural panning.

If I try to shift all sound to left side, then right channel of sound must be played in left side. In opposite, if I try to shift all sound to right side, then left channel of sound must be played in right side. I also noticed that there is thing called Panning Rule, which means that sound must be a little bit louder when it's shifted to side (about +3dB). I tried to find a way to perform natural panning effect, but I really couldn't find algorithm or reference of it.

Below is structure of float stereo PCM array, I actually didn't modify array when decoding audio files, so it should be common structure

[left_channel_sample_0, right_channel_sample_0, left_channel_sample_1, right_channel_sample_1,
...,
left_channel_sample_n, right_channel_sample_n]

and I have to pass this PCM array to audio stream like c++ code below

void PlayerQueue::renderStereo(float * audioData, int32_t numFrames) {
    for(int i = 0; i < numFrames; i++) {
        //When audio file is stereo...
        if(player->isStereo) {
            if((offset + i) * 2 + 1 < player->data.size()) {
                audioData[i * 2] += player->data.at((offset + i) * 2);
                audioData[i * 2 + 1] += player->data.at((offset + i) * 2 + 1);
            } else {
                //PCM data reached end
                break;
            }
        } else {
            //When audio file is mono...
            if(offset + i < player->data.size()) {
                audioData[i * 2] += player->data.at(offset + i);
                audioData[i * 2 + 1] += player->data.at(offset + i);
            } else {
                //PCM data reached end
                break;
            }
        }

        //Prevent overflow
        if(audioData[i * 2] > 1.0)
            audioData[i * 2] = 1.0;
        else if(audioData[i * 2] < -1.0)
            audioData[i * 2] = -1.0;

        if(audioData[i * 2 + 1] > 1.0)
            audioData[i * 2 + 1] = 1.0;
        else if(audioData[i * 2 + 1] < -1.0)
            audioData[i * 2 + 1] = -1.0;
    }

    //Add numFrames to offset, so it can continue playing PCM data in next session
    offset += numFrames;

    if(offset >= player->data.size()) {
        offset = 0;
        queueEnded = true;
    }
}

I excluded calculation of playback manipulating to simplify code. As you can see, I have to manually pass PCM data to audioData float array. I'm adding PCM data to perform mixing multiple sounds including same sound too.

  1. How to perform panning effect with this PCM array? It will be good if we can follow mechanisms of SoundPool, but it will be fine as long as I can perform panning effect properly. (EX: pan value can be just -1.0 to 1.0, 0 will mean centered)

  2. When applying Panning Rule, what is relationship between PCM and decibel? I know how to make sound louder, but I don't know how to make sound louder with exact decibel. Are there any formula for this?

2
When you pan it completely to the side, the sound is played through only one speaker, while the centered sound is played through both, "two times" "louder", hence the +3dB gain applied.bipll

2 Answers

2
votes

Pan rules or pan laws are implemented a bit different from manufacturer to manufacturer.

One implementation that is frequently used is that when sounds are panned fully to one side, that side is played at full volume, where as the other side is attenuated fully. if the sound is played at center, both sides are attenuated by roughly 3 decibels.

to do this you can multiply the sound source by the calculated amplitude. e.g. (untested pseudo code)

player->data.at((offset + i) * 2) * 1.0; // left signal at full volume
player->data.at((offset + i) * 2 + 1) * 0.0; // right signal fully attenuated

To get the desired amplitudes you can use the sin function for the left channel and the cos function for the right channel.

enter image description here

notice that when the input to sin and cos is pi/4, that the amplitude is 0.707 on both sides. This will give you your attenuation on both sides of around 3 decibels.

So all that is left to do is to map the range [-1, 1] to the range [0, pi/2] e.g. assuming you have a value for pan which is in the range [-1, 1]. (untested pseudo code)

pan_mapped = ((pan + 1) / 2.0) * (Math.pi / 2.0);

left_amplitude = sin(pan_mapped);
right_amplitude = cos(pan_mapped); 

UPDATE:

Another option frequently used (e.g. ProTools DAW) is to have a pan setting on each side. effectively treating the stereo source as 2 mono sources. This allows you to place the left source freely in the stereo field without affecting the right source.

To do this you would: (untested pseudo code)

left_output  += left_source(i)  * sin(left_pan)
right_output += left_source(i)  * cos(left_pan)
left_output  += right_source(i) * sin(right_pan)
right_output += right_source(i) * cos(right_pan)

The setting of these 2 pans are are up to the operator and depend on the recording and desired effect. How you want to map this to a single pan control is up to you. I would just advise that when the pan is 0 (centred) that the left channel is played only on the left side and the right channel is only played on the right side. Else you would interfere with the original stereo recording.

One possibility would be that the segment [-1, 0) controls the right pan, leaving the left side untouched. and vice versa for [0, 1].

hPi = math.pi / 2.0
  
def stereoPan(x):
    if (x < 0.0):
        print("left source:")
        print(1.0) # amplitude to left channel
        print(0.0) # amplitude to right channel
        print("right source:")
        print(math.sin(abs(x) * hPi)) # amplitude to left channel
        print(math.cos(abs(x) * hPi)) # amplitude to right channel

    else:
        print("left source:")
        print(math.cos(x * hPi)) # amplitude to left channel
        print(math.sin(x * hPi)) # amplitude to right channel  
        print("right source:")
        print(0.0) # amplitude to left channel
        print(1.0) # amplitude to right channel
0
votes

The following is not meant to contradict anything in the excellent answer given by @ruff09. I'm just going to add some thoughts and theory that I think is relevant when trying to emulate panning.

I'd like to point out that simply using volume differences has a couple drawbacks. First off, it doesn't match the real world phenomenon. Imagine you are walking down a sidewalk and immediately there on the street, on your right, is a worker with a jackhammer. We could make the sound 100% volume on the right and 0% on the left. But in reality much of what we hear from that source is also coming in the left ear, drowning out other sounds.

If you omit left-ear volume for the jackhammer to obtain maximum right-pan, then even quiet sounds on the left will be audible (which is absurd), since they will not be competing with jackhammer content on that left track. If you do have left-ear volume for the jackhammer, then the volume-based panning effect will swing the location more towards the center. Dilemma!

How do our ears differentiate locations in such situations? I know of two processes that potentially can be incorporated to the panning algorithm to make the panning more "natural." One is a filtering component. High frequencies that match wavelengths that are smaller than the width of our head get attenuated. So, you could add some differential low-pass filtering to your sounds. Another aspect is that in our scenario, the jackhammer sounds reach the right ear a few milliseconds before they reach the left. Thus, you could also add a bit of delay to based on the panning angle. The time-based panning effect works most clearly with frequency content that has wave lengths that are larger than our heads (e.g., some high-pass filtering would also be a component).

There has also been a great deal of work on how the shapes of our ears have differential filtering effects on sounds. I think that we learn to use this as we grow up by subconsciously associating different timbres with different locations (especially pertains to altitude and front vs. back stereo issues).

There are big computation costs, though. So simplifications such as sticking with purely amplitude-based panning is the norm. Thus, for sounds in a 3D world, it is probably best to prefer mono source content for items that need dynamic location changes, and only use stereo content for background music or ambient content that doesn't need dynamic panning based on player location.

I want to do some more experimenting with dynamic time-based panning combined with a bit of amplitude, to see if this can be used effectively with stereo cues. Implementing a dynamic delay is a little tricky, but not as costly as filtering. I'm wondering if there might be ways to record a sound source (preprocess it) to make it more amenable to incorporating real-time filter- and time-based manipulation that result in effective panning.