1
votes

I want to analyze some audio and decompose it as best as I can into sine waves. I have never used FFT before and am just doing some initial reading and about the concepts and available libraries, like FFTW and KissFFT.

I'm confused on this point... it sounds like the DFT/FFT will give you the sine amplitudes only at certain frequencies, multiples of a base frequency. For example, if I have audio sampled at the usual 44100 Hz, and I pick a chunk of say 256 samples, then that chuck could fit one cycle of 44100/256=172Hz, and the DFT will give me the sine amplitudes at 172, 172*2, 172*3, etc. Is that correct? How do you then find the strength at other frequencies? I'd like to see a spectrum all the way from 20Hz to about 15Khz, at about 1Hz increments.

3

3 Answers

2
votes

Fourier decomposition allows you to take any function of time and describe it as a sum of sine waves each with different amplitudes and frequencies. If however you want to approach this problem using the DFT, you need to make sure you have sufficient resolution in the frequency domain in order to distinguish between different frequencies. Once you have that you can determine which frequencies are dominant in the signal and create a signal consisting of multiples sinewaves corresponding to those frequencies. You are correct in saying that with a sampling frequency of 44.1 kHz, only looking at 256 samples, the lowest frequency you will be able to detect in those 256 samples is a frequency of 172 Hz.

OBTAIN SUFFICIENT RESOLUTION IN THE FREQUENCY DOMAIN:

Amplitude values for frequencies "only at certain frequencies, multiples of a base frequency", is true for Fourier decomposition, NOT the DFT, which will have a frequency resolution of a certain increment. The frequency resolution of the DFT is related to the sampling rate and number of samples of the time-domain signal used to calculate the DFT. Reducing the frequency spacing will give you a better ability to distinguish between two frequencies close together and this can be done in two ways;

  1. Decreasing the sampling rate, but this would move the periodic repetitions in frequency closer together. (Remember NyQuist theorem here)
  2. Increase the number of samples which you use to calculate the DFT. If only the 256 samples are available, one can perform "zero padding" where 0-valued samples are appended to the end of the data, but there are some effects to this which needs to be considered.

HOW TO COME TO A CONCLUSION:

If you depict the frequency content of different audio signals into individual graphs, you will find that the amplitudes differ abit. This is because the individual signals will not be identical in sound, and there is always noise inherent in any signal (from the surroundings and the hardware itself). Therefore, what you want to do is to take the average of two or more DFT signals to remove noise and get a more accurate represention of the frequency content. Depending on your application, this may not be possible if the sound you are capturing is noticably changing rapidly over time (for example speech, or music). Averaging is thus only useful if all the signals to be averaged are pretty much equal in sound (individual seperate recordings of "the same thing"). Just to clarify, from, for example, four time-domain signals, you want to create four frequency domain signals (using a DFT method), and then calculate the average of the four frequency-domain signals into a single averaged frequency-domain signal. This will remove noise and give you a better representation of which frequencies are inherent in your audio.

AN ALTERNATIVE SOLUTION:

If you know that your signal is supposed to contain a certain number of dominant frequencies (not too many) and these are the only ones your are interesting in, then I would recommend that you use Pisarenko's harmonic decomposition (PHD) or Multiple signal classification (MUSIC, nice abbreviation!) to find these frequencies (and their corresponding amplitude values). This is less intensive computationally than the DFT. For example. if you KNOW the signal contains 3 dominant frequencies, Pisarenko will return the frequency values for these three, but keep in mind that the DFT reveals much more information, allowing you come to more conclusions.

2
votes

Your initial assumption is incorrect. An FFT/DFT will not give you amplitudes only at certain discrete frequencies. Those discrete frequencies are only the centers of bins, each bin constituting a narrow-band filter with a main lobe of non-zero bandwidth, roughly a width or two of the FFT bin separation, depending on the window (rectangular, von Hann, etc.) applied before the FFT. Thus the amplitude of spectral content between bin centers will show up, but spread across multiple FFT result bins.

If the separation of key signals is large enough and the noise level is low enough, then you can interpolate the FFT results to examine frequencies between bin centers. You may need to use a high quality interpolator, such as a Sinc kernel.

If your signal separation is smaller or the noise level is higher, then you may need a longer window of data to feed a longer FFT to gather sufficient resolution information. An FFT window of length 256 at 44.1k sample rate is almost certainly just too short to gather sufficient information regarding spectral content below a few 100 Hz, if those are among the frequencies you would like to see examined, as they can't be separated cleanly from a DC bias (bin 0).

1
votes

Unfortunately, there's a degree of uncertainty in identifying the frequencies in a fixed sample of a signal. If you use a short FFT, then there's no way to tell the difference between frequencies over a fairly wide range. If you use a long FFT to get higher resolution in the frequency domain, then you can't detect frequency changes as quickly. This is inherent in the math.

Off the top of my head: If you want a 15kHz range at 1Hz increments, you need a 15000 point FFT, which at 44.1kHz means you'll get a frequency plot three times per second. (I may be missing a factor of 2 in there as I can't recall whether the Nyquist limit means you actually want a 30kHz bandwidth.)

You may also be interested in the Short-time Fourier transform. It doesn't solve the fundamental trade-off problem but in practice may get you what you want.