15
votes

I want to take audio PCM data and find peaks in it. Specifically, I want to return the frequency and time at which a peak occurs.

My understanding of this is that I have to take the PCM data and dump it into an array, setting it as the real values with the complex parts set to 0. I then take the FFT, and I get an array back. If each number in the array is a magnitude value, how do I get the frequency associated with each one? Also, do I take the magnitude of the real & complex part or just discard the complex values?

Finally, if I wanted to find the peaks in a single song, do I just set a small window to FFT and slide it across all of the audio? Any suggestions on how large that window should be?

3
Finding peaks in the audio PCM data is not the same thing as finding peaks in the FFT. What exactly are you trying to do?endolith

3 Answers

3
votes

You may actually be looking for a spectrogram, which is basically an FFT of the data in a small window that's slid along the time axis. If you have software that implements this, it might save you some effort. It's what's commonly used for analysing time varying acoustic signals, and is a very useful way to look at sounds. Also, there are some tricks, for example, with windowing data for FFTs, that the spectrogram will probably get right, but will be harder (though not very hard) for you to do correctly.

15
votes

If the samplerate of your PCM data is F, then the highest frequency component in the FFT is F/2. Suppose your PCM data was sampled at 44100Hz, then your FFT values will run from 0Hz (DC) to 22050Hz. If you start with N samples, (N being a power of 2), then the FFT may return N/2 values representing all positive frequencies from 0 to F/2, or it may return N values that also include the negative frequencies from -F/2 to 0. You should check the specification of your FFT algorithm to find out to which frequency each array item is mapped.

To find the peaks, you need to look at the magnitude of the FFT values. So you need to add the squared real and imaginary parts of each complex value.

Suppose your FFT of N PCM samples returns N/2 complex values representing positive frequencies. Then the distance between 2 complex samples is F/2N Hz. With F=44100Hz and N=1024 samples, this would be 21.5Hz. This is your frequency resolution. If you need to find lower frequency beats, the FFT window will need to be extended.

4
votes

well, A raw array of size 512 of complex numbers expressing the input wave, when processed with FFT we will replace the imaginary parts with zero (according to intended use), leaving the real parts, then pass the array to the FFT with Sample rate : 8192 Hz.

Now we have a 512 array of FFTed real values, each value is an irrational number, every irrational number express several useful values.

To get the fundamental frequency we have to divide the sample rate by the buffer size:

8192/512 = 32;

32 is the resolution of the FFT values means that we're getting to know the high amplitude frequencies near the numbers that are multiples of 32.

Like if we have a wave of

frequency : 3 48 23 128 Amplitude : 10 5 12 8 dB (ref = 1)

after FFT we get:

frequency : 0 32 64 128 Amplitude : 9 8 2 8

FFT is frequency domain means it arranges according to frequency Time-domain on the other side means arranging by time we listen to music from second zero to second N.

FFT can only listen when it arranged by Frequency from frequency 0 to frequency N.

So it arranges frequencies in ascending order, since it didn't take all the actual samples from the audio (which are approaching infinite) like taking every nanosecond & less to the FFT, luckily this doesn't happen FFT takes samples from the audio, takes a sample every (1/sample rate) second. this samples get buffered (in our case : 512), each 512 samples buffered into FFT, the output is 512 FFT values.

Since FFT arranges frequencies, it messes with the time samples, samples now arranged according to their frequencies.

The frequencies shown on regular base which is the fundamental frequency which is sample rate divided by buffer size, which is in our case 8192/512 = 32.

So, frequencies power shown every 32 frequencies, the power of the nearest frequency is shown according to how much the power frequency is near to the index.

High resolution can be achieved by using higher sample rate.

To show frequencies we print the index in ascending corresponding to the Amplitude.

Amplitude = 20log10(output/ref)

Amplitudes printed next to each Index show the power of the frequency & get more accurate according to the precision of the resolution.

Conclusion, FFT produces an index of amplitudes, each amplitude expresses the power of its corresponding index (frequency).