I know there are tons of topics on finding pitch from the FFT, and I've gained a decent understanding of the whole process from turning data samples from time-domain -> frequency-domain, but there are still some areas (probably more advanced) that I'm a little stuck on.
I'm going to walk step by step through my current process and so hopefully someone can help me understand where I'm going wrong!
Before I start, the example I'm using here is a Wav file that I created in Logic which is simply a Piano preset in the A scale, starting at Key A4, and it simply moves up the scale (A4, B4, C#5, D5...) every half bar for a total of 4 seconds at 120 bpm. Here's a link to the wav if it helps: [a https://www.dropbox.com/s/zq1u9aylh5cwlmm/PianoA4_120.wav?dl=0]
Step 1:
I parse out the metadata and the actual sample data.
Metadata:
channels => 2,
sample_rate => 44100,
byte_rate => 176400,
bits_per_sample => 16,
data_chunk_size => 705600,
data => ...
Step 2: Since there are 2 channels, I have a left & right array full of the corresponding sample data and then put each of them through their on FFT. The results of each FFT give me magnitudes and phases for a given frequency
Step 3:
I need to now find the max magnitude of each FFT. I do this by finding all the magnitudes of the real / complex results and then finding the max value. I'm using Matlab to help me so I run max(abs(fft(data)))
. The values I got from finding the max of each FFT were 1275.6 and 1084.0.
Step 4: Find the index of those max values from their respective FFTs and then find the frequency at that index of the mapped frequency-domain values. This gave me 1177.0 Hz and 1177.5 Hz.
This is where I'm confused! I've plotted the time-domain graph and seen how the pitch is found to be A4 simply by looking at the Period and knowing what the period of A4 is but I'm trying to understand how I can come to the same conclusion via the FFT. Any help / places to point me to would be greatly appreciated!