Linear / Non-Linear Fit to a Sine Curve

Question

I've had a look at this and this.

But I have a slightly different problem. I know that my data is a sine curve, of unknown period and unknown amplitude, with additive non-gaussian distributed noise.

I'm attempting to fit it using the GSL non-linear algorithm in C, but the fit is absolutely terrible. I'm wondering if I'm (wrongly) using a non-linear fitting algorithm where I should be using a linear one?

How do I tell if a particular dataset requires a linear or a non-linear algorithm?

EDIT: My curve is really noisy, so taking an FFT to figure out the frequency may result in false positives and bad fits. I'm looking for a slightly more robust way of fitting.

Curve with about 170 points

The above plot has about a 170 points as you can see, and the plot below has about 790 points.

enter image description here

The noise is distinctly non-gaussian, and large compared to the amplitude of the data. I've tried FFT's on gaussian-distributed noise, and my fit was wonderful. Here, it's failing quite badly.

ADDED: Link to first time series data. Each column in the file is a different time series.

@Kitchi: Each time series you posted only has 86 points, that's almost certainly not enough. Could you post just a couple of time series (2-3) with 1000 or more data points each? 10000 would be even better, if possible. — Alex I
"I know that my data [has] ... additive gaussian distributed noise" or "The noise is distinctly non-gaussian" -- which is it? — j_random_hacker

Fredrik Fredrik · Accepted Answer · 2013-01-24T17:11:25

If you know that your data is a sine curve, (which can be represented as a number of complex exponentials) then you can use Pisarkenko's harmonic decomposition; http://en.wikipedia.org/wiki/Pisarenko_harmonic_decomposition

However, if you have access to more data points, my approach would still to use an DFT.

UPDATE:

I used Pisarenko's harmonic decomposition (PHD) on your data, and even though your signals are extremely short (only 86 datapoints each), the PHD algorithm definately has potential if there is more data available. Included below are two (column 11 & 13 of your data) out of the 24 signals, depicted in blue, and the sine curve in red corresponds to the estimated amplitude/frequency values from PHD. (note that phase shift is unknown)

plot of data in column 11 plot of data in column 13

I used MATLAB (pisar.m) to perform PHD: http://www.mathworks.com/matlabcentral/fileexchange/74

% assume data is one single sine curve (in noise)
SIN_NUM = 1; 

for DATA_COLUMN = 1:24
    % obtain amplitude (A), and frequency (f = w/2*pi) estimate
    [A f]=pisar(data(:,DATA_COLUMN),SIN_NUM);

    % recreated signal from A, f estimate
    t = 0:length(data(:,DATA_COLUMN))-1;
    y = A*cos(2*pi*f*t);

    % plot original/recreated signal
    figure; plot(data(:,DATA_COLUMN)); hold on; plot(y,'r')
    title({'data column ',num2str(DATA_COLUMN)});

    disp(A)
    disp(f)
end

Which resulted in

1.9727     % amp. for  column 11
0.1323     % freq. for column 11
2.3231     % amp. for  column 13
0.1641     % freq. for column 13

VERIFICATION OF PHD:

I also did another test where I knew the values of amplitude and frequency and then added noise to see if PHD can estimate the values properly from the noisy signal. The signal consisted of two added sine curves with frequencies 50 Hz, 120 Hz respectively, and amplitudes 0.7, 1.0 respectively. In the figure below, the curve in red is the original, and the blue is with added noise. (figure is cropped)

test of PHD accuracy

Fs = 1000; % Sampling frequency
T = 1/Fs; % Sample time
L = 1000; % Length of signal
t = (0:L-1)*T; % Time vector

% Sum of a 50 Hz sinusoid and a 120 Hz sinusoid
x = 0.7*sin(2*pi*50*t) + sin(2*pi*120*t);
y = x + 0.4*randn(size(t)); % Sinusoids plus noise

figure;
plot(Fs*t(1:100),y(1:100)); hold on; plot(Fs*t(1:100),x(1:100),'r')
title('Signal Corrupted with Zero-Mean Random Noise (Blue), Original (Red)')

[A, f] = pisar(y',2); 
disp(A)
disp(f/Fs)

PHD estimated the amp/freq values to be:

0.7493    % amp wave 1  (actual 0.7)
0.9257    % amp wave 2  (actual 1.0)
58.5      % freq wave 1 (actual 50)
123.8     % freq wave 2 (actual 120)

Not bad for quite a bit of noise, and only knowing the number of waves the signal consists of.

REPLY @Alex:

Yeah it's a nice algorithm, I came across it during my DSP studies, and I thought it worked quite well, but it's important to note that Pisarenko's Harm.Dec. models any signal as N > 0 sinusoids, N being specified from start, and it uses that value to ignore noise. Thus, by definition, it is ONLY useful when you know roughly how man sinusoids your data is comprised of. If you have no clue of the value for N and you need to run the algorithm for a thousand different values, then a different approach is definately recommended. That said, evaluation is thereafter straightforward since it returns N amplitude and frequency values.

Multiple signal classification (MUSIC), is another algorithm which continues where Pisarenko left off. http://en.wikipedia.org/wiki/Multiple_signal_classification

Linear / Non-Linear Fit to a Sine Curve

4 Answers