3
votes

I am making a pitch detection program using fft. To get the pitch I need to find the lowest frequency that is significantly above the noise floor.

All the results are in an array. Each position is for a frequency. I don't have any idea how to find the peak.

I am programming in C#.

Here is a screenshot of the frequency analysis in audacity. alt text

3
I assume you mean the "peak", not "pick" ?Thomas Levesque

3 Answers

2
votes

It would be easier if you had some notion of the absolute values to expect, but I would suggest:

  • find the lowest (weakest) value first. It is your noise level.
  • compute the average level, it is your signal strength
  • define some function to decide the noise threshold. This is the tricky part, it may require some experimentation.

In a bad situation, signal may be only 2 or 3 times the noise level. If the signal is better you can probably use a threshold of 2xnoise.


Edit, after looking at the picture:

You should probably just start at the left and find a local maximum. Looks like you could use 30 dB threshold and a 10-bin window or something.

2
votes

Finding the lowest peak won't work reliably for estimating pitch, as this frequency is sometimes completely missing, or down in the noise floor. For better reliability, try another algorithm: autocorrelation (AMDF, ASDF lag), cepstrum (FFT log FFT), harmonic product spectrum, state space density, and variations thereof that use neural nets, genetic algorithms or decision matrices to decide between alternative pitch hypothesis (RAPT, YAAPT, et.al.).

Added:

That said, you could guess a frequency, compute the average and standard deviation of spectral magnitudes for, say, a 2-to-1 frequency range around your guess, and see if there exists a peak significantly above the average (2 sigma?). Rinse and repeat for some number of frequency guesses, and see which one, or the lowest of several, has a peak that stands out the most from the average. Use that peak.

2
votes

Instead of attempting to find the lowest peak, I would look for a fundamental frequency which maximizes the spectral energy captured by its first 5 integer multiples. Note that every peak is an integer multiple of the lowest peak. This is a hack of the cepstrum method. Don't judge :).

N.B. From your plots, I assume a 1024 sample window and 44.1kHZ sampling Rate. This yields a frequency granularity of only 44.1kHz/1024 = 43Hz. Given a 44.1kHz audio, I recommend using a longer analysis window of ~50 ms or 2048 samples. This would yield a finer frequency granularity of ~21 Hz.

Assuming a Matlab vector 'psd' of size 2048 with the PSD values.

% 50 Hz (Dude) -> 50Hz/44100Hz * 2048 -> ~2 Lower Lim
% 300 Hz (Baby) -> 300Hz/44100Hz * 2048 -> ~14 Upper Lim
lower_lim = 2;
upper_lim = 14
for fund_cand = lower_lim:1:upper_lim
    i_first_five_multiples = [1:1:5]*fund_cand;
    sum_energy = sum(psd(i_first_five_multiples));
end

I would find the frequency which maximizes the sum_energy value.