2
votes

I'm using audiorecorder to record sound and do some processing in pseudorealtime on android phone. i'm facing a problem between FFT and convolution of audio signal: I perform FFT on a known signal(a sine waveform), and i correctly always find the single tone contained in it, by using the FFT.

Now i want to do the same thing by using a convolution (it's an exercise): I perform 5000 convolutions of that signal by using 5000 filters. Each filter is a sine waveform on a different frequency between 0 and 5000 Hz. Then, i search the peak for each convolution output. By this way i should find the maximum peak when i'm using the filter with the same tone contained on the signal.

Infact with a tone of 2kHz i can find the max with the 2kHz filter.

The problem is that when i receive a 4kHz tone, i find the max on the convolution with the 4200Hz filter (while the FFT instead always works fine) Is it matematically possible? what is the problem in my convolution?

This is the convolution function that i wrote:

 //i do the convolution and return the max
 //IN is the array with the signal
 //DATASIZE is the size of the array IN
 //KERNEL is the filter containing the sine at the selected frequency

 int convolveAndGetPeak(short[] in,int dataSize, double[] kernel) {
        //per non rischiare l'overflow, il kernel deve avere un ampiezza massima pari a 1/10 del max
        int i, j, k;
        int kernelSize=kernel.length;
        int tmpSignalAfterFilter=0;
        double out;

        // convolution from out[0] to out[kernelSize-2]
        //iniziamo 
        for(i=0; i < kernelSize - 1; ++i)
        {
            out = 0; // init to 0 before sum

            for(j = i, k = 0; j >= 0; --j, ++k)
                out += in[j] * kernel[k];

            if (Math.abs((int) out)>tmpSignalAfterFilter ){
                tmpSignalAfterFilter=Math.abs((int) out);   
            }
        }

        // start convolution from out[kernelSize-1] to out[dataSize-1] (last)
        //iniziamo da dove eravamo arrivati
        for( ; i < dataSize; ++i)
        {
            out = 0;  // initialize to 0 before accumulate

            for(j = i, k = 0; k < kernelSize; --j, ++k)
                out += in[j] * kernel[k];

            if (Math.abs((int) out)>tmpSignalAfterFilter ){
                tmpSignalAfterFilter=Math.abs((int) out);   
            }

        }


        return tmpSignalAfterFilter;
    }

the kernel, used as filter, is generated this way:

 //curFreq is the frequency of the filter in Hz
 //kernelSamplesSize is the desired length of the filter (number of samples), for time precision reasons i'm using 20 samples length.
 //sampleRate is the sampling frequency

 double[] generateKernel(int curFreq,int kernelSamplesSize,int sampleRate){
    double[] curKernel= new double[kernelSamplesSize] ;

    for (int kernelIndex=0;kernelIndex<curKernel.length;kernelIndex++){
        curKernel[kernelIndex]=Math.sin( (double)kernelIndex * ((double)(2*Math.PI) * (double)curFreq / (double)sampleRate));    //the part that makes this a sine wave....
    }
    return curKernel;

 }

if you want to try a convolution, the data contained in the IN array is the following: http://www.tr3ma.com/Dati/signal.txt

Note1: the sampling frequency is 44100Hz

Note2: the tone contained in the signal is a single 4kHz tone (even if the convolution has the max peak with a 4200Hz filter.

EDIT: I also repeated the test on a excel sheet. the result is the same (of course, i'm using the same algorithm) and the algorithms seems to me to be correct... this is the excel sheet i prepared, if you prefer to work on excel: http://www.tr3ma.com/Dati/convolutions.xlsm

2

2 Answers

2
votes

You change the bandwidth by two factors:

a) The length of your kernel (e.g. a length t of 5ms produces a rough bandwidth of f >= 200Hz, estimated with 1/0.005 because Δt·Δf >= 1, see "Heisenberg"), and

b) the window function (which you definitely should implement to make your algorithm working in real-world applications because otherwise in some cases sidelobes of some filter outputs could yield more energy than the main lobe of the expected filter output).

But you have another problem: you need to convolve with a 2nd kernel consisting of cosine waves (which means that you need the same waves as in the 1st kernel but shifted by 90 degrees). Why is that? Because with only the sine kernel, you get a phase-dependent modulation of the filter outputs (e.g. if the phase difference between the input signal and the kernel wave with the identical frequency is 90 degrees you get the amplitude 0).

Finally, you combine the outputs of both kernels with Pythagoras.

0
votes

it seems all correct, apart the number of samples of the kernel (the filter). Increasing the size of the filter the result is more accurate. I don't know how to calculate the bandwidth of this filter but it seems clear to me that it's a matter of filter bandwidth. So, the filter bandwidth depends also on the number of samples of the filter used in the convolution, with reference to the sampling frequency(and may be also with reference to the tone frequency). Unfortunately i can not increase too much the number of samples of my filter since otherwise the phone can not perform the filtering in realtime. Note: i need the convolution cause i need to identify the precise moment when the tone was fired.

EDIT: i made a compare between filter with 20 samples and filter with 40 samples. I don't know the formula to obtain the fitler bandwidth but it's clear, in the following image, the difference between the 2 filters.

EDIT2: FEW DAYS AFTER POSTING THE SOLUTION I FOUND HOW TO CALCULATE THE BANDWIDTH OF SUCH FILTER: IT'S JUST THE INVERSE OF THE FILTER DURATION. SO IN EXAMPLE A KERNEL OF 40 SAMPLES AT 44100KhZ HAS A DURATION OF ABOUT 907uS, THEN THE FILTER BANDWIDTH, WITH THIS KERNEL AND A WINDOW OF THE SAME LENGTH IS 1/907uS= 1,1KhZ
(source: tr3ma.com)