1
votes

I am trying to reverse python numpy/scipy's fft, rfft, and dct transforms back into a sum of sine/cosine waves to reconstruct the original dataset. I want to do this because I want to be able to reconstruct the original dataset with more/less sampling points (which I believe may already be covered with scipy.signal.resample) and mainly because I want to extend the sine/cosine series into the future not unlike how a linear regression might be used in some series to give a general idea of future values. I know this is not technically correct, as fft assumes the discrete sample is repeated over all time points, and dct assumes the data is "mirrored", but I think it may have some sort of short term predictive value.

I tried to follow what was written here as a guide to Numpy's decomposition algorithms: http://snowball.millersville.edu/~adecaria/ESCI386P/esci386-lesson17-Fourier-Transforms.pdf

Here is my code:

import numpy as np
from scipy.fftpack import fft,ifft,dct,idct,rfft,irfft
import matplotlib.pyplot as plt

def reconstructSeries(transformedVals,newxvals):
    transformedVals=transformedVals.astype('complex128')

    transformedVals=transformedVals/len(transformedVals) #for some reason, numpy does not normalize the values it has, so I have to do it here.


    reconstructedVals=np.zeros(len(newxvals))
    series=[]
    # perhaps [:len(transformedVals)//2] ?
    for frequency,val in enumerate(transformedVals):    
        #the position of the coefficient is the frequency (in radians)

        #amplitude=np.sqrt(np.real(val)**2+np.imag(val)**2)
        #phase=np.arctan(np.imag(val)/np.real(val))
        series.append(lambda x: np.real(val)*np.cos(frequency*newxvals)-np.imag(val)*np.sin(frequency*newxvals))
        #series.append(lambda x: amplitude*np.cos(2*np.pi*frequency*newxvals+phase)) #this is in radians to accomidate phase and the default cosine function
        reconstructedVals=reconstructedVals+np.array(series[frequency](newxvals))

    return reconstructedVals,series

#y=np.arange(250)
y=np.cos(np.arange(250)+5)

yf = fft(y) #this can be rfft or dct as well
myyvalues,sinosoidseries=reconstructSeries(yf,np.arange(250))

plt.plot(ifft(yf));plt.plot(y);plt.plot(myyvalues);plt.show()

What this code is supposed to do is:

  1. Convert all input data-arrays to complex (because dct doesn't output a complex datatype).
  2. Normalize the Fourier coefficients, as fft() and related transforms don't appear to divide by the number of elements in the dataset.
  3. Populate an array of lambda functions representing the individual contributions of each Fourier frequency (I assume they are the ordinal positions of the Fourier coefficients)
  4. Cumulatively sum the individual contributions of each sinusoidal lambda function taken at the new points to be sampled to reconstruct a series

In this particular code, I'm trying to see if my recomposition equals the original series/scipy's inverse decomposition just to make sure I'm doing it right. I think the code works fine, but the underlying formulas it uses for the sine/cosine reconstruction are wrong. Here is the output for this particular code:

Picture of Reconstructed Values

Picture of Reconstructed Values with y=np.arange(250)

Green is my reconstructed values and Orange/Blue are the original values. Clearly, my algorithm is not remaking the series properly. Using amplitude and phase to combine the sine and cosine terms into a single cosine term, as was recommended on other sites, gives a different, but still incorrect, result, most likely due to the subtraction of the sine term that was recommended in the above source. Does anyone know how my formula or code is wrong? I'm thinking it is either in the cos()-sin() part, or something like the frequency not being multiplied by a constant.

*Note: I do know this question is somewhat like: Fourier Series from Discrete Fourier Transform but I don't think the answer to this works for me here.

1
There is 0 predictive value in the DFT and friends. If you compute the next sample after the last of the input sequence by extending the sine waves, you’ll recreate the first value in the sequence. If that is predictive, simply take your first value as prediction for the future! :) — For interpolation, simply pad your frequency domain signal with zeros (adding zero-valued higher frequencies) and then apply the inverse transform.Cris Luengo
Thanks! This really helped. For anyone curious, all the scipy inverse functions have a parameter, "n" which pads with 0s and allows for extrapolation. I tried this and it does seem to extrapolate. However, over the original dataset, the reconstruction does not match the original in shape (once rescaled properly). If we set higher frequencies to zero, shouldn't it be reconstructing the original series exactly and extrapolating based off this?stillQuestioning
Don't use the n parameter in the ifft function. I've updated my answer to explain also how to correctly pad the frequency-domain signal.Cris Luengo

1 Answers

1
votes

The error I see in the code is in the complex multiplication: you multiply the real component of the frequency-domain sample with cos, and the imaginary component with sin. This is not how the complex multiplication works. You need to multiply the complex sample value with the complex value cos + i sin. The complex numbers a+ib and c+id, when multiplied, yield ac-bd+iad+ibc, not ac+bd.


Edit: How to pad the Frequency domain with zeros for interpolation

The SciPy ifft function has a parameter n that you can use to pad the array with zeros before the transform. Don't use this parameter. It adds zeros to the end of the signal, destroying the symmetry of the signal, and hence typically yields a non-real result.

The DFT (what fft computes) has frequencies k=0...N-1. But k is periodic, meaning that k=N-1 is the same as k=-1. We need to preserve the (complex conjugate) symmetry around k=0 that exists in the Fourier domain for real-valued time-domain signals, which means that the values of the Frequency-domain signal for k=1 and k=-1 must maintain this symmetry (and also for k=2 and k=-2, etc.).

When padding with zeros, we increase this value N, so we also change where k=-1 is in the array (as it is at k=N-1, increasing N means this location moves).

Thus, the padding must add zeros right in the middle of the array, such that the original values at the beginning and the end of the array are preserved. The middle of the array is in between (N+1)//2-1 and (N+1)//2:

N = 250
y = np.cos(np.arange(N)+5)
yf = fft(y)
yf = np.concatenate((yf[:(N+1)//2], np.zeros(N), yf[(N+1)//2:]))
y2 = ifft(yf)
plt.subplot(2,1,1)
plt.plot(y,'.-')
plt.subplot(2,1,2)
plt.plot(y2,'.-')
plt.show()

Note how the time-domain signal remains the same, but has double the number of samples.

Note also how this does not extrapolate: if you extend the sine and cosine waves that compose y, you'll reconstruct values from the beginning of the signal, as the y reconstructed in this way is periodic. That is, y[N]==y[0], y[N+1]==y[1], etc.