12
votes

I'm trying to learn how to work with audio in as many different ways as possible.

Given a known audio stream (lets call it stream1) and an unknown audio stream (stream2) which are mixed into one single stream (mix1).

Now assuming that we know stream1 in advance but not stream2 would it be possible to use stream1 to cancel itself out of mix1 and therefore give us stream2 with a minimum of noise/interference?

To give it a real world context imagine a situation would be where your computer has a microphone and speakers (not headphones) and because the computer knows in advance (ok, only milliseconds, but still) the output to the speakers would it be possible to cancel that sound from the mix coming in on the microphone. In this real world situation the known stream is not perfectly known as there is likely to be some distortion between transmission and reception.

Assuming this is possible can someone suggest some reading about the algorithms involved?

2
I'm also interested in this. I found the phrase "acoustic echo cancellation", but that technique (as the name suggests) just cancels echo in the microphone input stream, without taking into account the sound the computer is producing.Thomas
Or maybe it is the right phrase? svconline.com/proav/…Thomas
@Thomas, from the link you pasted it would seem you are right and the term for it is 'acoustic echo cancellation'. I will research it some more and if no-one else answers and I find more information I will add it here. Thanksm3z
Just from a cursory look around I've found what seems to be an example for matlab code: mathworks.co.uk/help/dsp/examples/… Right now I'm a little tired for the thinking involved so I'm going to look through it tomorrow - just thought you may be interested.m3z

2 Answers

7
votes

Yes, this is possible. Two methods:

Time Domain

If you can guarantee that the mixed audio is sample-accurate to the timing of the original stream1, then you can simply negate the original stream1 and add it to the mix. Now, you might have to scale that waveform a bit, since usually when audio is mixed, their level is reduced.

If there are other things done to the audio (such as level compression), then this affects your ability to do this sort of subtraction of sound cleanly.

Frequency Domain

While normal PCM-encoded audio is just a sampling of pressure many times per second, this is not how sound is fully perceived. We hear different frequencies. If you use a Fourier transform (normally done with an FFT algorithm), you convert audio samples from a time domain to the frequency domain, giving you the level of sound in various frequency buckets along the way.

If you convert both stream1 and the mix to the frequency domain, subtract stream1 from the mix, and then convert back to the time domain for output, you can effectively remove much of stream1 from the mix. The more frequency buckets you use, the more CPU needed, but the more accurate this removal will be. Note that while this means you don't have to quite be sample-accurate, it does typically hurt the quality of the sound from the mix.

Many audio editing programs use this method to remove background noise.

0
votes

Sound is simply a curve - typically it fluctuates above and below zero over time (16 bit audio has 2^16 possible integers available so raw PCM audio is just a stream of integers in the range of +- 32768) - once in this format - just toggle the sign (+-) of the stream1 integer then add it to the corresponding mix integer as your walk through the data of both stream1 and mix an integer at a time and then renormalize back to +- 32768 to regain your volume - this effectively erases stream1 from your mix - the audio tool Audacity gives you this option