I'm hoping someone can review my code below and offer hints how to speed up the section between tic and toc. The function below attempts to perform an IFFT faster than Matlab's built-in function since (1) almost all of the fft-coefficient bins are zero (i.e. 10
to 1000
bins out of 10M
to 300M
bins are non-zero), and (2) only the central third of the IFFT results are retained (the first and last third are discarded -- so no need to compute them in the first place).
The input variables are:
fftcoef = complex fft-coef 1D array (10 to 1000 pts long)
bins = index of fft coefficients corresponding to fftcoef (10 to 1000 pts long)
DATAn = # of pts in data before zero padding and fft (in range of 10M to 260M)
FFTn = DATAn + # of pts used to zero pad before taking fft (in range of 16M to 268M) (e.g. FFTn = 2^nextpow2(DATAn))
Currently, this code takes a few orders of magnitude longer than Matlab's ifft
function approach which computes the entire spectrum then discards 2/3
's of it. For example, if the input data for fftcoef and bins are 9x1
arrays (i.e. only 9
complex fft coefficients per sideband; 18
pts when considering both sidebands), and DATAn=32781534
, FFTn=33554432
(i.e. 2^25
), then the ifft approach takes 1.6
seconds whereas the loop below takes over 700
seconds.
I've avoided using a matrix to vectorize the nn loop since sometimes the array size for fftcoef and bins could be up to 1000
pts long, and a 260Mx1K
matrix would be too large for memory unless it could be broken up somehow.
Any advice is much appreciated! Thanks in advance.
function fn_fft_v1p0(fftcoef, bins, DATAn, FFTn)
fftcoef = [fftcoef; (conj(flipud(fftcoef)))]; % fft coefficients
bins = [bins; (FFTn - flipud(bins) +2)]; % corresponding fft indices for fftcoef array
ttrend = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1); % preallocate
start = round(DATAn/3)-1;
tic;
for nn = start+1 : round(2*DATAn/3) % loop over desired time indices
% sum over all fft indices having non-zero coefficients
arg = 2*pi*(bins-1)*(nn-1)/FFTn;
ttrend(nn-start) = sum( fftcoef.*( cos(arg) + 1j*sin(arg));
end
toc;
end
length(bins)*(2*DATAn/3)
operations, better thanDATAn*lg(DATAn)
for the FFT approach if2*length(bins)/3 > lg(DAtan)
(since FFTW handles non-power-of-2 transform sizes, I'm ignoring the zero-padding). For the case of 10 bins and 2^25 output points, that's '20/3 > 25', a potential factor of 3 improvement. As soon as you get to 75 FFT coeffs, you've lost the advantage. And you have to code the algorithm in C and maintain it. – mtrw