I am trying to make a fast fft convolution (fft blocksize=1024 samples) of an headpone related impulse response (L=512 samples) with an sine wave audio signal. Here you can see the plot of the impulse response :
http://fs2.directupload.net/images/150617/fc9j6cs7.png
I split the wave audio signal in blocks with blocksize M=513. Then I zeroppaded each wave block and the hrtf to 1024 samples, applied fft, multiplication and ifft. You can see the result of one block in the following picture:
http://fs1.directupload.net/images/150617/bxoe9fkm.png
After this I slided each block 513 samples on the time scale further than the last block (Hop Size = 0) and added it to the old block, what gave a correct convolved output.
Here you can see (a simplified version of) the python code for 5 added output blocks:
# set iteration counter to 0
blocknumber = 0
# read in audio file
_, audiodata = scipy.io.wavefile.read("filename_audio_wave")
_, hrtf_block = scipy.io.wavefile.read("filename_hrtf_wave")
while blocknumber <5:
# set blocksizes
fft_blocksize = 1024
audio_blocksize = 513
hrtf_blocksize = 512
binaural = np.zeros((fft_blocksize*5, ), dtype=np.int16)
# Do zeropadding: zeropad hrtf and audio
hrtf_block_zeropadded = np.zeros((fft_blocksize, ), dtype = 'int16')
hrtf_block_zeropadded[0:hrtf_blocksize, ] = hrtf_block
sp_block_sp_zeropadded = np.zeros((fft_blocksize, ), dtype = 'int16')
sp_block_sp_zeropadded[0:sp_blocksize, ] = audiodata[blocknumber*audio_blocksize : (blocknumber+1)*audio_blocksize, ]
# bring time domain input to to frequency domain
hrtf_block_fft = fft(hrtf_block_zeropadded, fft_blocksize)
audio_block_fft = fft(audio_block_zeropadded, fft_blocksize)
binaural_block_frequency = hrtf_block_fft * audio_block_fft
binaural_block = ifft(binaural_block_frequency, fft_blocksize).real
# add the block to the other blocks
slide_forward_samples = 513
binaural[blocknumber*slide_forward_samples : blocknumber*slide_forward_samples+fft_blocksize, ] += binaural_block
blocknumber+=1
In the next step I wanted to convolve each block with a slighty different impulse response what led to crackling noise between the blocks. I found out that i have to apply a window and let the the convolved blocks overlap. I didn't get how to do it exactly. Can you please give me some advices?
Let us consider we want to reach on overlap of 50% and use the hamming window.
- Is it correct that every block needs to contain now 50% of the samples of the previous block?
- Where do i have to apply the window? Do I have to apply it before the fft convolution on the audio signal blocks (windowsize : 513 samples) or on the ifft output (windowsize 1024: samples)?
- And how many samples do I need to slide the fft output signal on the timescale with 50% overlap?
Thank your very much for your help