I wrote some code that takes an audio signal (currently a sine wave) as an input and does the following:
- Take frames of
n
(1024) samples - Apply FFT
- Apply iFFT
- Play output
With this process the output signal is basically the same as the input signal.
Now, in a second attempt I do:
- Take overlapping frames from the input
- Apply a window function
- FFT
- iFFT
- Overlap the output frames
In step 1, if I take overlapping frames using a hop size (number of samples to jump to take next frame) of a power of 2 (4, 8, 256...) the output sound is smooth and resembles the original input sound, but with any other hop size, the sound starts to crack down. This happens for any frequency of the input signal. Question 1. Why is the sound smooth only if the hop size is 2^n?.
Currently I use a Hanning window. When the hop size is large (e.g. 512) the output sound has a lower volume than when the hop size is small (e.g.64). This seems an expected behavior, because a small hop-size implies that a sample is reconstructed with more frames, so more signals are added. Question 2. Is there a way to properly scale the output signal so that the volume resembles the original signal?
Thank you!