Cross correlate data that contains "spikes"

Question

When using xcorr in MATLAB to cross correlate 2 related data sets, everything works as expected - I see a correlation peak and the lag reported is correct. However, when I use xcorr to cross correlate unrelated data sets where both data sets contain 1 cluster of "spikes", I see a correlation peak and the lag reported is the distance between the 2 spikes.

In this image:

x is a random data series. y is also a random data series. Both x and y have 30 random peaks inserted into the series in sequence. In theory, there should be no correlation between the 2 data sets since they are both very different. However, it can be seen from the 3rd plot that there is a very strong correlation between the 2 data sets. The code used to generate this figure is at the bottom of this post.

I've tried to filter the spikes using a few different mechanisms (rolling rms power ... etc) before performing the xcorr. This has worked in some cases but not all. I feel like I need a different approach to the problem, maybe an alternative to xcorr. I do understand why x and y cross correlate using xcorr. Is there another cross correlation tool that I can use? Note x and y will never be exactly the same, they will only ever be approximately the same but in normal operation, it's not the spikes that should make them correlate.

Any suggestions on how to tell if x and y correlate while also ignoring the "spikes"?

Here is some my example code:

x = rand(1, 3000);
x = x - 0.5;
y = rand(1, 3000);
y = y - 0.5;
% insert the impulses into the data
impulse_width = 30;
impulse_max_height = 6;
x_impulse_start = 460;
y_impulse_start = 120;
rand_insert_x = rand(1, impulse_width);
rand_insert_x = (rand_insert_x - 0.5) * 2 * impulse_max_height;
rand_insert_y = rand(1, impulse_width);
rand_insert_y = (rand_insert_y - 0.5) * 2 * impulse_max_height;
x(1,x_impulse_start:x_impulse_start + impulse_width - 1) = rand_insert_x;
y(1,y_impulse_start:y_impulse_start + impulse_width - 1) = rand_insert_y;
subplot(3, 1, 1);
plot(x);
ylim([-impulse_max_height impulse_max_height]);
title('random data series: x');
subplot(3, 1, 2);
plot(y);
ylim([-impulse_max_height impulse_max_height]);
title('random data series: y');
[c, l] = xcorr(x, y);
subplot(3, 1, 3);
plot(l, c);
title('correlation using xcorr');

I see why this is a problem, but If I wouldnt have read your post I would think: that xcorr is doing a good job, as it is aligning those quite similar signals together! — Ander Biguri
I think, the first step should be how do you define whether or not a "bunch of data" is considered a spike or not. — rst

Ander Biguri Ander Biguri · Accepted Answer · 2015-10-07T15:17:02

The way to solve this is to use normalized cross-correlation.

In normalize cross-correlation the correlation is 1 when the signals are exactly the same, and less when they are not. You can see it as "percentage of similarity".

To do that in MATLAB, you just need to add 'coeff' as an argument to your code.

So, if I change your code to [c, l] = xcorr(x, y,'coeff'); the plot I get is the nest:

(note I changed sample size to 600 to make it more readable)

the cross-correlation gets to 0.3 there, so not much. However, if we change your code lines to

x(1,x_impulse_start:x_impulse_start + impulse_width - 1) = rand_insert_x;
y(1,y_impulse_start:y_impulse_start + impulse_width - 1) = rand_insert_x;

and insert the same random patter in both signals, then we get:

Now, the cross-correlation gets to a high value, almost 1, but not one, because the big random pattern there is the same, but the rest of the signal is not.

Cross correlate data that contains "spikes"

3 Answers