Deciding on length of FFT

Question

I am working on a tool to compare two wave files for similarity in their waveforms. Ex, I have a wave file of duration 1min and i make another wave file using the first one but have made each 5sec data at an interval of 5sec to 0. Now my software will tell that there is waveform difference at time interval 5sec to 10sec, 15sec to 20sec, 25sec to 30 sec and so on...

As of now, with initial development, this is working fine. Following are 3 test sets:

I have two wave files with sampling rate of 960Hz, mono, with no of data samples as 138551 (arnd 1min 12sec files). I am using 128 point FFT (splitting file in 128 samples chunk) and results are good.
When I use the same algorithm on wave files of sampling rate 48KHz, 2-channel with no of data samples 6927361 for each channel (arnd 2min 24 sec file), the process becomes too slow. When I use 4096 point FFT, the process is better.
But, the 4096 point FFT on files of 22050Hz, 2-channels with number of data samples 55776 for each channel (arnd 0.6sec file) gives very poor results. In this case 128 point FFT gives good result.

So, I am confused on how to decide the length of FFT so that my results are good in each case.

I guess the length should depend on number of samples and sampling rate. Please give your inputs on this.

Thanks

There are 100 things that you have to learn before you attempt something like this, to name the few - FFT windowing, sound resampling (sample rate changing), current techniques for audio fingerprinting, ... Determining correct ALGORITHM for sound matching should be FIRST and it should come from the REQUIREMENTS of your sound matching. — Daniel Mošmondor
@DanielMošmondor Right now, I am not worried about different sampling rates of files as one of my input file is made using the other one. So, sampling rate is least of my concern. I am just comparing the waveforms and trying to find how similar they are by taking FFT and comparing the magnitude of the freq components. The problem I am facing is deciding in the length of FFT, to keep it fixed or dependent on sampling rate and total no of samples (or length of wav file) — Garfield

Paul R Paul R · Accepted Answer · 2011-11-15T09:04:30

The length of the FFT, N, will determine the resolution in the frequency domain:

resolution (Hz) = sample_rate (Hz) / N

So for example in case (1) you have resolution = 960 / 128 = 7.5 Hz. SO each bin in the resulting FFT (or presumably the power spectrum derived from this) will be 7.5 Hz wide, and you will be able to differentiate between frequency components which are at least this far apart.

Since you don't say what kind of waveforms these are, or what the purpose of your application is, it's hard to know what kind of resolution you need.

One important further point - many people using FFT for the first time are unaware that in general you need to apply a window function prior to the FFT to avoid spectral leakage.

Deciding on length of FFT

2 Answers