0
votes

I want to get the offset in samples between two datasets in Matlab (getting them synced in time), a quite common issue. Therefore I use the cross correlation function xcorr or the cross covariance function xcov (both provide similar results in most cases for this purpose). With artificial data it works fine, but I struggle with "real" data, even though it should be pretty much the same. Matlab always says the offset would be zero. I'm using this simple piece of code:

[crossCorr] = xcov(b, c);
[~, peakIndex] = max(crossCorr())
offset = peakIndex - length(b)

I've posted a fully runable example m-file with a downsampled data excerpt on pastebin: Code with data on pastebin

EDIT: The downsampled excerpt seems to be not fully suitable for evaluating the effect. Here's a much larger sample with the original frequency, pease use this one instead. Unfortunately it was too big for pastebin.

As the plot shows it should be no problem at all to get the offset via cross covariance. I also tried to scale the data nicer in order to avoid numerical problems, but that didn't change anything at all.

Would be great if someone could tell me my mistake.

3
How did you theoretical data look? I am not completely sure here, but it could be that you have to few periods in your data. Let's say that you have a peak at 0 in one set and a peak on 5 in the other set. Then 5 should have a really high correlation compared to the rest. but your data is like a long slope. The value at 0 is comparable to the value in 5. This gives you (as you probably have seen) a quite low correlation over all. It is possible that this technic is supposed to work anyway, but that is my guess. You could try different theoretical data an try to compare.patrik
Thanks. This here was my theoretical test. I was quite impressed, that even noise with the same amplitude as the random signal was no problem at all. The real data is not periodical, but the excerpt was a bit short. Here's a much longer excerpt, but I unfortunately had to downsample it much more for pastebin accepting it.user3296542

3 Answers

1
votes

There's nothing wrong with your method in principle, I used exactly the same approach successfully for temporally aligning different audio recordings of the same signal.

However, it appears that for your time series, correlation (or covariance) is simply not the right measure to compare shifted versions – possibly because they contain components of a time scale comparable to the total length. An alternative is to use residual variance, i.e. the variance of the difference between shifted versions. Here is a (not particularly elegant) implementation of this idea:

lags = -1000 : 1000;
v = nan(size(lags));
for i = 1 : numel(lags)
    lag = lags(i);
    if lag >= 0
        v(i) = var(b(1 + lag : end) - c(1 : end - lag));
    else
        v(i) = var(b(1 : end + lag) - c(1 - lag : end));
    end
end
[~, ind] = min(v);
minlag = lags(ind);

For your (longer) data set, this results in minlag = 169. Plotting residual variance over lags gives:

lag-vs-var

0
votes

Your data has a minor peak around 5 and a major peak around 101.

If I knew something about my data then I could might window around an acceptable range of offsets as shown below.

Code for initial exploration:

figure; clc;
subplot(2,1,1) 
plot(1:numel(b), b);
hold on
plot(1:numel(c), c, 'r');
legend('b','c')

subplot(2,1,2)
plot(crossCorr,'.b-')
hold on
plot(peakIndex,crossCorr(peakIndex),'or')
legend('crossCorr','peak')

Initial Image:

enter image description here

If you zoom into the first peak you can see that it is not only high around 5, but it is polynomial "enough" to allow sub-element offsets. That is convenient.

Image showing :

enter image description here

Here is what the curve-fitting tool gives as the analytic for a cubic:

Linear model Poly3:
     f(x) = p1*x^3 + p2*x^2 + p3*x + p4
Coefficients (with 95% confidence bounds):
       p1 =  8.515e-013  (8.214e-013, 8.816e-013)
       p2 = -3.319e-011  (-3.369e-011, -3.269e-011)
       p3 =  2.253e-010  (2.229e-010, 2.277e-010)
       p4 = -4.226e-012  (-7.47e-012, -9.82e-013)

Goodness of fit:
  SSE: 2.799e-024
  R-square: 1
  Adjusted R-square: 1
  RMSE: 6.831e-013

You can note that the SSE fits to roundoff. If you compute the root (near n=4) you use the following matlab code:

% Coefficients
       p1 =  8.515e-013
       p2 = -3.319e-011
       p3 =  2.253e-010
       p4 = -4.226e-012
% Linear model Poly3:
syms('x')
f = p1*x^3 + p2*x^2 + p3*x + p4

xz1=fzero(@(y) subs(diff(f),'x',y), 4)

and you get the analytic root at 4.01420240431444.

EDIT: Hmmm. How about fitting a gaussian mixture model to the convolution? You sweep through a good range of component count, you do between 10 and 30 repeats, and you find which component count has the best/lowest BIC. So you fit a gmdistribution to the lower subplot of the first figure, then test the covariance at the means of the components in decreasing order.

I would try the offset at the means, and just look at sum squared error. I would then pick the offset that has the lowest error.

Procedure:

  • compute cross correlation
  • fit cross correlation to Gaussian Mixture model
    • sweep a reasonable range of components (start with 1-10)
    • use a reasonable number of repeats (10 to 30 depending on run-to-run variation)
    • compute Bayes Information Criterion (BIC) for each level, pick the lowest because it indicates a reasonable balance of error and parameter count
  • each component is going to have a mean, evaluate that mean as a candidate offset and compute sum-squared error (sse) when you offset like that.
  • pick the offset of the component that gives best SSE

Let me know how well that works.

0
votes

If the two signals misalign by non-integer number of samples, e.g. 3.7 samples, then the xcorr method may find the max value at 4 samples, it won't be able to find the accurate time shift. In this case, you should try a method called "unified change detection". The web-link for the paper is: [http://www.phmsociety.org/node/1404/]

Good Luck.