0
votes

I need to measure signal frequency while the musicians play music, and it happens to be a bit too fast for FFT (Fast Fourier Transform).

Musicians play music at 90-140 bpm. This means that there are 90-140 groups of notes each minute, up to 8 (more frequently, up to 4) notes in each group (60/140/8 = 0.0536 sec, 60/90/4 = 0.167 sec), that is, notes may change at the rate of 6-19 notes per second.

The music uses a logarithmic scale: the range between, say, 440Hz and 880Hz is divided into 12 notes, only 7 of which are used for melody. (Basically, they use only the white keys on the piano; when they want to shift the starting frequency, they use some of the black keys and don't use some white keys.) That is, the frequency of each next note is multiplied by 2^(1/12) = 1.05946.

To make things more complicated, the A (La) frequency may vary from 438 to 446 Hz. The string instruments in theory can be tuned, while the wind instruments depend on the air temperature and humidity, so the frequency happens to be re-negotiated by the musicians during the sound check.

Sometimes musicians and vocalists make errors in frequency, they call it "out of tune". They want a device that would inform them of such "out of tune errors". They have tuners, but the tuners require playing the same sound for about 1 sec before they start showing anything. This works for tuning, but does not work while the music is played.

Most likely, the tuner is doing FFT, and due to the formula

df = 1/T

waits for 1 second to get the 1Hz resolution.

For A=440Hz, the difference in frequency between two notes is 440*0.05946 = 26.16 Hz, to get that frequency resolution, one has to use acquisition time of 0.038 sec, that is, at tempo=196bpm FFT is able to just distinguish two notes, at 98 bpm it is able to tell a 50% out-of-tune error provided that it starts acquisition at the very moment that the pitch changes. If we allow the pitch change in the course of an acquisition period, we get 49 bpm, which is just too slow. In addition, it is very desirable to be more precise about the frequency, say, detect a 25% out-of-tune error.

Is there a way to measure frequency better than FFT, that is, with better resolution in less acquisition time? (At least 2 times better, ideally, 8 times better.) In exchange, I do not need to distinguish between notes of different octaves, e.g. both 440 and 880 may be recognized as A. (Probably, more trade-offs are possible, just nothing else comes to my mind right now.)

UPD Here's a really good drawing:

Note frequencies linked from Wikipedia

UPD2

I have found a PhD thesis and open source software (TARTINI -- the real-time music analysis tool) at:

http://miracle.otago.ac.nz/tartini/

(The pages are also available via the web archive service: http://web.archive.org = http://archive.org = http://waybackmachine.org )

1
You say frequency, but I suspect you mean pitch ? - Paul R
Actually this isn't just pedantry - it makes a significant difference if you're dealing with music. Frequency is a physical quantity, whereas pitch is a percept, and has a fairly complex relationship with the frequencies and amplitudes of the components of a given sound. An FFT (or more accurately a power spectrum derived from an FFT) will tell you the frequencies and amplitudes of the components, but getting from here to the perceived pitch is non-trivial (i.e. it's not just the frequency of the fundamental component or the loudest component). See: Harmonic Product Spectrum. - Paul R
Another piece of the puzzle that you may be missing: it sounds like you're assuming that sample windows will be consecutive, so you only get 1 pitch estimate per window, but a commonly used technique is to overlap successive sample windows, e.g. if you overlap each window by 75% then you get pitch estimates at 4 times the rate, but with the same resolution (albeit with some correlation between successive windows, due to the overlap). - Paul R
@18446744073709551615: that just gives you an N/4 point FFT with the output interpolated to N points - it doesn't magically give you the resolution of an N point FFT. - Paul R
BTW, since you're only at the theory stage here, might I suggest you take this to dsp.stackexchange.com ? It will be more on-topic there and you'll likely get better answers from people more knowledgeable than I. - Paul R

1 Answers

2
votes

Regarding the FFT, assuming the narrow-band spectral frequency content is sparse and well separated in low enough background noise, frequency peaks can be interpolated or phase vocoded to much higher resolution than the FFT bin spacing (bin spacing as related to the inverse of the length of the segment of actual time-domain data). Parabolic interpolation is common, but there are other more accurate interpolation kernels. Phase vocoder frequency estimation methods require stationarity across 2 overlapped frames, however the total span of those 2 frames can be relatively short.

But the peak spectral frequency reported by an FFT is not the same as a pitch frequency as perceived by a human (as voices and many musical instruments can radiate more acoustic spectral energy in an overtone series than at pitch frequency, sometimes slightly inharmonically.) There are algorithms more suited for pitch estimation than FFTs (alone). A partial list is in this answer: FFT on iPhone to ignore background noise and find lower pitches

Many academic papers on pitch estimation methods for music can be found on the music-ir/MIREX site: http://www.music-ir.org/mirex/wiki/MIREX_HOME