2
votes

I have been experimenting with simple FFT using p5 sound and then plotting the bands of the spectrum visually.

One thing i noticed is that the lower frequencies appears very high in almost all tracks while the high frequencies seems to be mute.

So for instance when doing FFT only with 16 bands most of the sound happens only on the first 4 bands and it seems that the other frequencies ( the higher ones ) are reported to be "muted" or just too quiet.

You can see this on this example for instance: http://p5js.org/reference/#/p5.FFT where even with relatively high frequencies the right side of the spectrum stays totally down, the lower frequencies are reported to be the highest even tough what you here is more of a middle / higher pitch kind of sound.

It seems that some sort of transformation have to be applied to the FFT result in order to have a visual representation that matches better that we hearing?

Am i missing something? I mean, i'm surely missing some basic information about how FFT works and how the frequencies are reported, but i mean, is that a common problem that has a common solution?

1

1 Answers

2
votes

The human auditory system is fundamentally logarithmic base-2 in nature - each subsequent octave has twice the bandwidth of the next. As a consequence of this, the vast majority of the frequency content of human perceivable sound is below 1kHz, and signal power is spread more thinly between FFT bins at higher frequencies - which is precisely what your graph shows.

Spectrograms - which is what I suspect you're expecting to see here - are plotted with log(F) on the x-axis and signal power in dB on the Y axis. Your code draw a graph with both axes linear.

In addition, because you are not specifically applying a window function to the samples used to calculate the FFT , what you get by default is the rectangular window - very far from a good choice in this application.