2
votes

I want to make a program that detects the note that is being played in front of the microphone. I am testing the FFT function of Naudio, but with the tests that I did in audacity it seems that FFT does not detect the pitch correctly. I played an C5, but the highest pick was at E7.

I changed the first dropdown box in the frequency analysis window to "enchanced autocorrelation" and after that the highest pick was at C5.

I googled "enchanced autocorrelation" and had no luck.

3
I Googled "real time pitch detection .net" and got this: stackoverflow.com/questions/1466968/…Cody Gray
I think that maybe you're picking the frequency with the highest amplitude (which gives you the strongest harmonic) rather than the lowest frequency that's significantly above the noise floor (which gives you the pitch played).Gabe
Have you seen nicholson.com/rhn/dsp.html#1?Gabe
You where right Gabe. I was looking the frequency with the highest amplitude. Thanks, this solves my problem. I will keep experimenting with Naudio.Aaron de Windt

3 Answers

2
votes

You are likely getting thrown off by harmonics. Have you tried testing with a sine wave to see if your NAudio's FFT is in the ballpark?

See these references: http://cnx.org/content/m11714/latest/

http://www.gamedev.net/community/forums/topic.asp?topic_id=506592&whichpage=1

Line 48 in Spectrum.cpp in the Audacity source code seems to be close to what you want. They also reference an IEEE paper by Tolonen and Karjalainen.

1
votes

Well, if you can live with GPLv2, why not take a peek at the Audacity source code?

http://audacity.sourceforge.net/download/beta_source

1
votes

The highest peak in an audio spectrum is not necessarily the musical pitch as a human would perceive it, especially in a sound with strong overtones. That's because pitch is a human psycho-perceptual phenomena, the brain will often deduce frequencies that aren't even present in a waveform.

Auto-correlation methods of frequency or pitch estimation (roughly, finding how far apart even a funny-looking and/or non-sinusoidal waveform repeats in time) is usually a better match for what a human would call pitch. The reason for various enhancements to the autocorrelation algorithm is that simple autocorrelation will find an near infinite number of repeating wavelengths (e.g. if it repeats every 1 second it also repeats twice every 2 seconds, etc.) So the trick is to weight the correlation to somehow statistically better match what a human would guess about the same waveform.