sound analyzer using naudio for 48000 samples/sec sound. Can I use a cycle-sample-size of 1024?

Question

I need to create a sound analyzer to isolate certain song frequencies. For now, I'm interested in bass (60-250Hz).

I read the signal (IEEE float), for each block of 1024: do a FFT, and then extract the value corresponding to each frequency.

What I don't understand is this: I know FFT needs powers of 2 in order to work. I've seen code using blocks of 512, code using 2048, 4096 and so on.

I've settled on 1024 (which gives me roughly 47 datapoints/second). Am I correct in assuming that using, 2048, for instance will work just the same, giving me 23.5 datapoints/second, and the only difference is accuracy (and speed of computation of course)?

Also, am I required to read at 1024-boundary blocks? Like, for instance, say I simply skip the first 200 floats, will the results end up being very similar? (my tests seem to say yes)

LATER EDIT: updated title to make it easier to understand

The general rule is Nyquist which says you must sample at least twice the max frequency when doing a FFT. — jdweng
The human ear can only hear up to 20KHz and voice up to 2KHz. See : nhc.com.au/blog/…. — jdweng
Right. You can have a 1 second sample or a one minute sample. But it does care about samples per second. If you use the wrong samples per second you get the wrong frequencies for the output. — jdweng
Power of 2 reduces the number of calculations needed to get answer. It does not need to be a power of 2. See Wiki article : en.wikipedia.org/wiki/Fast_Fourier_transform — jdweng

hotpaw2 hotpaw2 · Accepted Answer · 2020-09-06T16:49:14

1024/48kHz is barely longer than one period of a 60 Hz signal. Too short to determine if the signal is even fully periodic (repeats). Humans typically require somewhere around 6 periods of repetitions to hear a sound as a having a definite pitch.

60 Hz is B1. You might need 2 Hz resolution to separate B1 from C1 with a clear gap in between the two nearest FFT frequency bins. To do that, just using FFT magnitude results, would require an FFT of 48kHz/2Hz or a half second, or longer. The nearest power of 2, for 48ksps samples, is 32768.

For music pitch frequencies, there are much better pitch detector/estimators than using a bare FFT or FFT frequency peak magnitude, as they solve the missing or weak fundamental issue common in recorded instrumental or vocal music. Those pitch estimators can work with shorter time interval windows than a half second, but require more computation than a bare FFT magnitude peak picking.

sound analyzer using naudio for 48000 samples/sec sound. Can I use a cycle-sample-size of 1024?

1 Answers