10
votes

This is actually more of a theoretical question, but here it goes:

I'm developing an effect audio unit and it needs an equal power crossfade between dry and wet signals.

But I'm confused about the right way to do the mapping function from the linear fader to the scaling factor (gain) for the signal amplitudes of dry and wet streams.

Basically, I'ev seen it done with cos / sin functions or square roots... essentially approximating logarithmic curves. But if our perception of amplitude is logarithmic to start with, shouldn't these curves mapping the fader position to an amplitude actually be exponential?

This is what I mean:

Assumptions:

  • signal[i] means the ith sample in a signal.
  • each sample is a float ranging [-1, 1] for amplitudes between [0,1].
  • our GUI control is an NSSlider ranging from [0,1], so it is in principle linear.
  • fader is a variable with the value of the NSSlider.

First Observation: We perceive amplitude in a logarithmic way. So if we have a linear fader and merely adjust a signal's amplitude by doing: signal[i] * fader what we are perceiving (hearing, regardless of the math) is something along the lines of:

enter image description here

This is the so-called crappy fader-effect: we go from silence to a drastic volume increase across the leftmost segment in the slider and past the middle the volume doesn't seem to get that louder.

So to do the fader "right", we instead either express it in a dB scale and then, as far as the signal is concerned, do: signal[i] * 10^(fader/20) or, if we were to keep or fader units in [0,1], we can do :signal[i] * (.001*10^(3*fader))

Either way, our new mapping from the NSSlider to the fader variable which we'll use for multiplying in our code, looks like this now:

enter image description here

Which is what we actually want, because since we perceive amplitude logarithmically, we are essentially mapping from linear (NSSLider range 0-1) to exponential and feeding this exponential output to our logarithmic perception. And it turns out that : log(10^x)=x so we end up perceiving the amplitude change in a linear (aka correct) way.

Great.

Now, my thought is that an equal-power crossfade between two signals (in this case a dry / wet horizontal NSSlider to mix together the input to the AU and the processed output from it) is essentially the same only that with one slider acting on both hypothetical signals dry[i] and wet[i].

So If my slider ranges from 0 to 100 and dry is full-left and wet is full-right), I'd end up with code along the lines of:

Float32 outputSample, wetSample, drySample = <assume proper initialization>
Float32 mixLevel = .01 * GetParameter(kParameterTypeMixLevel);
Float32 wetPowerLevel = .001 * pow(10, (mixLevel*3)); 
Float32 dryPowerLevel = .001 * pow(10, ((-3*mixLevel)+1));
outputSample = (wetSample * wetPowerLevel) + (drySample * dryPowerLevel);

The graph of which would be:

enter image description here

And same as before, because we perceive amplitude logarithmically, this exponential mapping should actually make it where we hear the crossfade as linear.

However, I've seen implementations of the crossfade using approximations to log curves. Meaning, instead:

enter image description here

But wouldn't these curves actually emphasize our logarithmic perception of amplitude?

1
I'd suggest asking this on the DSP sister site: dsp.stackexchange.comNik Reiman
I think I got it by now but hey I didn't know about that site!SaldaVonSchwartz
Cool. If you got it figured out, you should answer your own question -- I for one would like to know the answer you came up with.Nik Reiman

1 Answers

9
votes

The "equal power" crossfade you're thinking of has to do with keeping the total output power of your mix constant as you fade from wet to dry. Keeping total power constant serves as a reasonable approximation to keeping total perceived loudness constant (which in reality can be fairly complicated).

If you are crossfading between two uncorrelated signals of equal power, you can maintain a constant output power during the crossfade by using any two functions whose squared values sum to 1. A common example of this is the set of functions

g1(k) = ( 0.5 + 0.5*cos(pi*k) )^.5

g2(k) = ( 0.5 - 0.5*cos(pi*k) )^.5,

where 0 <= k <= 1 (note that g1(k)^2 + g2(k)^2 = 1 is satisfied, as mentioned). Here's a proof that this results in a constant power crossfade for uncorrelated signals:

Say we have two signals x1(t) and x2(t) with equal powers E[ x1(t)^2 ] = E[ x2(t)^2 ] = Px, which are also uncorrelated ( E[ x1(t)*x2(t) ] = 0 ). Note that any set of gain functions satisfying the previous condition will have that g2(k) = (1 - g1(k)^2)^.5. Now, forming the sum y(t) = g1(k)*x1(t) + g2(k)*x2(t), we have that:

E[ y(t)^2 ] = E[ (g1(k) * x1(t))^2  +  2*g1(k)*(1 - g1(k)^2)^.5 * x1(t) * x2(t)  +  (1 - g1(k)^2) * x2(t)^2 ] 
= g1(k)^2 * E[ x1(t)^2 ] + 2*g1(k)*(1 - g1(k)^2)^.5 * E[ x1(t)*x2(t) ] + (1 - g1(k)^2) * E[ x2(t)^2 ]
= g1(k)^2 * Px + 0 + (1 - g1(k)^2) * Px = Px,

where we have used that g1(k) and g2(k) are deterministic and can thus be pulled outside the expectation operator E[ ], and that E[ x1(t)*x2(t) ] = 0 by definition because x1(t) and x2(t) are assumed to be uncorrelated. This means that no matter where we are in the crossfade (whatever k we choose) our output will still have the same power, Px, and thus hopefully equal perceived loudness.

Note that for completely correlated signals, you can achieve constant output power by doing a "linear" fade - using and two functions that sum to one ( g1(k) + g2(k) = 1 ). When mixing signals that are somewhat correlated, gain functions between those two would theoretically be appropriate.

What you're thinking of when you say

And same as before, because we perceive amplitude logarithmically, this exponential mapping should actually make it where we hear the crossfade as linear.

is that one signal should perceptually decrease in loudness as a linear function of slider position (k), while the other signal should perceptually increase in loudness as a linear function of slider position, when applying your derived crossfade. While your derivation of that seems pretty spot on, unfortunately that may not the best way to blend your dry and wet signals in terms of consistency - often, maintaining equal output loudness, regardless of slider position, is the better thing to shoot for. In any case, it might be worth trying a couple different functions to see what is most usable and consistent.