3
votes

I have a WAV file which I would like to visualize in the frequency domain. Next, I would like to write a simple script that takes in a WAV file and outputs whether the energy at a certain frequency "F" exceeds a threshold "Z" (whether a certain tone has a strong presence in the WAV file). There are a bunch of code snippets online that show how to plot an FFT spectrum in Python, but I don't understand a lot of the steps.

  1. I know that wavfile.read(myfile) returns the sampling rate (fs) and the data array (data), but when I run an FFT on it (y = numpy.fft.fft(data)), what units is y in?
  2. To get the array of frequencies for the x-axis, some posters do this where n = len(data):

    X = numpy.linspace(0.0, 1.0/(2.0*T), n/2)

    and others do this:

    X = numpy.fft.fftfreq(n) * fs)[range(n/2)]

    Is there a difference between these two methods and is there a good online explanation for what these operations do conceptually?

  3. Some of the online tutorials about FFTs mention windowing, but not a lot of posters use windowing in their code snippets. I see that numpy has a numpy.hamming(N), but what should I use as the input to that method and how do I "apply" the output window to my FFT arrays?
  4. For my threshold computation, is it correct to find the frequency in X that's closest to my desired tone/frequency and check if the corresponding element (same index) in Y has an amplitude greater than the threshold?
1

1 Answers

1
votes
  1. FFT data is in units of normalized frequency where the first point is 0 Hz and one past the last point is fs Hz. You can create the frequency axis yourself with linspace(0.0, (1.0 - 1.0/n)*fs, n). You can also use fftfreq but the components will be negative.

  2. These are the same if n is even. You can also use rfftfreq I think. Note that this is only the "positive half" of your frequencies, which is probably what you want for audio (which is real-valued). Note that you can use rfft to just produce the positive half of the spectrum, and then get the frequencies with rfftfreq(n,1.0/fs).

  3. Windowing will decrease sidelobe levels, at the cost of widening the mainlobe of any frequencies that are there. N is the length of your signal and you multiply your signal by the window. However, if you are looking in a long signal you might want to "chop" it up into pieces, window them, and then add the absolute values of their spectra.

  4. "is it correct" is hard to answer. The simple approach is as you said, find the bin closest to your frequency and check its amplitude.